Extract text and structured content from various document types using our powerful OCR and natural language processing tools.
Our API provides a simple way to process documents, images, and PDFs to extract valuable information in a structured format.
{
"text": "Sample extracted document content.",
"entities": {
"names": ["John Smith"],
"organizations": ["Acme Corp"],
"dates": ["January 15, 2023"],
"locations": ["New York"],
"emails": ["john@example.com"]
},
"metadata": {
"filename": "document.pdf",
"pages": 2,
"language": "eng"
}
}
Extract text from images and scanned documents with our advanced OCR technology. Supports multiple languages and various document formats.
Automatically identify and extract entities like names, organizations, dates, and contact information from your documents.
Built with security in mind. Our API uses token-based authentication and secure connections to protect your data.
Document Type | Support Level | Features |
---|---|---|
Full | Text extraction, OCR for scanned documents, metadata extraction | |
Images (PNG, JPG, TIFF) | Full | OCR text extraction, image preprocessing, metadata extraction |
Word Documents (DOCX) | Full | Text extraction, table extraction, metadata extraction |
Plain Text (TXT) | Full | Text processing, entity extraction |