The Document Processing API provides a simple and powerful interface for extracting text and structured content from various document types. The API supports PDFs, Microsoft Office documents, and images, and includes OCR capabilities for processing scanned documents.
This API follows RESTful principles and returns data in JSON format. All requests require authentication using an API key.
All API requests require authentication using an API key. You can obtain an API key from your dashboard after signing up.
Include your API key in one of the following ways:
X-API-Key: your_api_key_here
https://api.example.com/documents?api_key=your_api_key_here
Upload a document for processing.
Accepts either multipart/form-data with a file upload or application/json with a file URL.
Form Data Parameters:file
- The document file to uploadoptions
(optional) - JSON string of processing optionsurl
- URL of the document to downloadoptions
(optional) - Processing optionslanguage
- OCR language code (default: "eng")ocr
- Force OCR processing (default: auto-detected)extract_entities
- Extract named entities (default: false)dpi
- DPI for image processing (default: 300){
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued",
"submitted_at": 1616161616.123,
"status_url": "https://api.example.com/api/documents/550e8400-e29b-41d4-a716-446655440000/status"
}
Check the status of a document processing job.
job_id
- The ID of the job to check (from the upload response){
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed", // Or "queued", "processing", "failed"
"submitted_at": 1616161616.123,
"completed_at": 1616161620.456
}
Get the result of a completed document processing job.
job_id
- The ID of the completed jobThe response format depends on the document type and processing options, but generally includes:
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"result": {
"text": "The extracted document text...",
"metadata": {
"filename": "document.pdf",
"file_size": 12345,
"file_type": "pdf",
"page_count": 5,
"created_at": "2023-01-01T12:00:00",
... (other metadata)
},
"pages": [
{
"page_num": 1,
"text": "Page 1 text...",
... (page-specific data)
},
... (other pages)
],
"entities": {
"people": ["John Doe", "Jane Smith"],
"organizations": ["Acme Corp", "Example Inc."],
"locations": ["New York", "London"],
"dates": ["January 1, 2023", "2023-01-01"],
"key_phrases": ["important document", "contract agreement"]
},
... (other extracted data)
}
}
Get statistics about the processing queue.
{
"stats": {
"queue_length": 5,
"processing_count": 2,
"completed_count": 100,
"failed_count": 3,
"avg_processing_time": 2.5
},
"timestamp": 1616161616.123
}
Check the API health status.
{
"status": "ok",
"version": "1.0.0"
}
curl -X POST "https://api.example.com/api/documents" \
-H "X-API-Key: your_api_key_here" \
-F "file=@document.pdf" \
-F 'options={"language":"eng","extract_entities":true}'
curl -X POST "https://api.example.com/api/documents" \
-H "X-API-Key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/path/to/document.pdf",
"options": {
"language": "eng",
"extract_entities": true
}
}'
# Check status
curl "https://api.example.com/api/documents/550e8400-e29b-41d4-a716-446655440000/status" \
-H "X-API-Key: your_api_key_here"
# Get results when job is completed
curl "https://api.example.com/api/documents/550e8400-e29b-41d4-a716-446655440000/result" \
-H "X-API-Key: your_api_key_here"
The API uses standard HTTP status codes to indicate the success or failure of an API request. In general:
Error responses have the following format:
{
"error": "Error message",
"details": {
// Additional error details
}
}
Status Code | Error | Description |
---|---|---|
400 | Bad Request | Invalid request parameters |
401 | Unauthorized | Missing or invalid API key |
404 | Not Found | Resource not found |
429 | Too Many Requests | Rate limit exceeded |
500 | Server Error | Internal server error |