API Documentation

Introduction

The Document Processing API provides a simple and powerful interface for extracting text and structured content from various document types. The API supports PDFs, Microsoft Office documents, and images, and includes OCR capabilities for processing scanned documents.

This API follows RESTful principles and returns data in JSON format. All requests require authentication using an API key.

Authentication

All API requests require authentication using an API key. You can obtain an API key from your dashboard after signing up.

Include your API key in one of the following ways:

HTTP Header (Recommended)
X-API-Key: your_api_key_here
Query Parameter
https://api.example.com/documents?api_key=your_api_key_here

API Endpoints

POST /api/documents

Upload a document for processing.

Request

Accepts either multipart/form-data with a file upload or application/json with a file URL.

Form Data Parameters:
  • file - The document file to upload
  • options (optional) - JSON string of processing options
JSON Parameters:
  • url - URL of the document to download
  • options (optional) - Processing options
Processing Options:
  • language - OCR language code (default: "eng")
  • ocr - Force OCR processing (default: auto-detected)
  • extract_entities - Extract named entities (default: false)
  • dpi - DPI for image processing (default: 300)
Response
{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "queued",
  "submitted_at": 1616161616.123,
  "status_url": "https://api.example.com/api/documents/550e8400-e29b-41d4-a716-446655440000/status"
}
GET /api/documents/{job_id}/status

Check the status of a document processing job.

Parameters
  • job_id - The ID of the job to check (from the upload response)
Response
{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",  // Or "queued", "processing", "failed"
  "submitted_at": 1616161616.123,
  "completed_at": 1616161620.456
}
GET /api/documents/{job_id}/result

Get the result of a completed document processing job.

Parameters
  • job_id - The ID of the completed job
Response

The response format depends on the document type and processing options, but generally includes:

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "result": {
    "text": "The extracted document text...",
    "metadata": {
      "filename": "document.pdf",
      "file_size": 12345,
      "file_type": "pdf",
      "page_count": 5,
      "created_at": "2023-01-01T12:00:00",
      ... (other metadata)
    },
    "pages": [
      {
        "page_num": 1,
        "text": "Page 1 text...",
        ... (page-specific data)
      },
      ... (other pages)
    ],
    "entities": {
      "people": ["John Doe", "Jane Smith"],
      "organizations": ["Acme Corp", "Example Inc."],
      "locations": ["New York", "London"],
      "dates": ["January 1, 2023", "2023-01-01"],
      "key_phrases": ["important document", "contract agreement"]
    },
    ... (other extracted data)
  }
}
GET /api/queue/stats

Get statistics about the processing queue.

Response
{
  "stats": {
    "queue_length": 5,
    "processing_count": 2,
    "completed_count": 100,
    "failed_count": 3,
    "avg_processing_time": 2.5
  },
  "timestamp": 1616161616.123
}
GET /api/health

Check the API health status.

Response
{
  "status": "ok",
  "version": "1.0.0"
}

Examples

Example 1: Upload a PDF Document
curl -X POST "https://api.example.com/api/documents" \
  -H "X-API-Key: your_api_key_here" \
  -F "file=@document.pdf" \
  -F 'options={"language":"eng","extract_entities":true}'
Example 2: Process a Document from a URL
curl -X POST "https://api.example.com/api/documents" \
  -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/path/to/document.pdf",
    "options": {
      "language": "eng",
      "extract_entities": true
    }
  }'
Example 3: Check Job Status and Get Results
# Check status
curl "https://api.example.com/api/documents/550e8400-e29b-41d4-a716-446655440000/status" \
  -H "X-API-Key: your_api_key_here"

# Get results when job is completed
curl "https://api.example.com/api/documents/550e8400-e29b-41d4-a716-446655440000/result" \
  -H "X-API-Key: your_api_key_here"

Error Handling

The API uses standard HTTP status codes to indicate the success or failure of an API request. In general:

  • 2xx - Success
  • 4xx - Client error (invalid request, authentication, etc.)
  • 5xx - Server error

Error responses have the following format:

{
  "error": "Error message",
  "details": {
    // Additional error details
  }
}
Common Error Codes
Status Code Error Description
400 Bad Request Invalid request parameters
401 Unauthorized Missing or invalid API key
404 Not Found Resource not found
429 Too Many Requests Rate limit exceeded
500 Server Error Internal server error