Submit identity documents and invoices for AI-powered validation, data extraction, and enrichment. Processing runs asynchronously by default — you submit documents, and results are delivered via webhooks or available through polling.
Costs 5 credits per request.
Submitting a Document
Send a POST /document-processing request with the processor ID and one or more files:
Code
You can also provide a publicly accessible URL instead of base64 data:
Code
Request Fields
| Field | Required | Description |
|---|---|---|
processorId | Yes | Which document processor to run (see available processors below) |
files | Yes | Array of document files. Each file needs mimeType and filename, plus either base64Data or url |
inputData | No | Key-value context passed to the AI for cross-checking extracted fields against known values |
backgroundProcessing | No | Default true. Set to false to wait synchronously for the full result |
Available Processors
| Processor ID | Accepted Formats | Description |
|---|---|---|
SRI_LANKA_ID | image/jpeg, image/png, application/pdf | Sri Lankan NIC (old & new), passport, and driver's license. Full KYC pipeline with visual integrity checks, voter registration enrichment, and cross-validation against national records. |
INVOICE | image/jpeg, image/png, application/pdf | Commercial invoices — extracts supplier, customer, line items, amounts, and currency |
SRI_LANKA_FORM1_COMPANY_REGISTRATION | application/pdf | Extracts company registration data from Sri Lankan Form 1 documents |
SRI_LANKA_FORM15_ANNUAL_RETURN | application/pdf | Extracts annual return data from Sri Lankan Form 15 documents |
Input Data Fields
The inputData field provides verification context to the AI. Values are matched against extracted document fields, and mismatches are flagged.
For SRI_LANKA_ID:
| Key | Description |
|---|---|
name | Full name to match against the document |
dateOfBirth | Expected date of birth (YYYY-MM-DD) |
idNumber | NIC or passport number to verify |
address | Residential address to compare (warning-level check) |
For INVOICE:
| Key | Description |
|---|---|
expectedVendor | Supplier name to verify against the document |
expectedAmount | Total amount to verify against the document |
Response
A successful submission returns HTTP 201 with the instance details:
Code
Use the instanceId to poll GET /document-processing/{id} for the current status, or subscribe to the documentVerification.instance.updated webhook event.
Processing Pipeline
Documents are processed through a multi-stage pipeline. Stages are scheduled as a DAG (directed acyclic graph), meaning independent stages run in parallel for faster throughput.
Pipeline Stages
The SRI_LANKA_ID processor runs the full 5-stage identity verification pipeline:
Code
| Stage | ID | Description |
|---|---|---|
| Crop & Detect Faces | crop_and_detect_faces | Scans and crops the document image, detects faces, and uploads processed images for downstream stages |
| Pre-Validate & Extract | pre_validate_and_extract | Combined AI-powered document validation and structured data extraction. Runs after cropping completes |
| Visual Integrity Check | visual_integrity_check | Runs visual matching and document tampering detection in parallel with extraction. Flags signs of digital manipulation |
| Post-Validation | post_validation | Merges extraction and visual integrity results, runs processor-specific programmatic validation rules and AI post-validation checks |
| Voter Registry | voter_registry | Fetches voter registration data by NIC number from the Sri Lanka Elections eServices and cross-validates extracted fields (name, gender) across Sinhala, Tamil, and English |
Other processors (INVOICE, SRI_LANKA_FORM1_COMPANY_REGISTRATION, SRI_LANKA_FORM15_ANNUAL_RETURN) run a simplified pipeline without the visual integrity check and voter registry stages.
Instance Status
The status field on an instance reflects the overall lifecycle state:
| Status | Meaning |
|---|---|
pending | Job accepted, waiting to be picked up |
running | Pipeline is actively processing (one or more stages in progress) |
completed | All processing finished successfully |
failed | The pipeline detected an issue with the document (validation failure, unreadable, wrong document type). This is user-actionable |
error | An internal or infrastructure issue prevented processing (LLM timeout, service unavailable). This is not user-actionable |
Status Labels
While status tracks the high-level lifecycle, the statusLabel field provides human-readable detail about which stage is currently running or what failed. Status labels update in real time as the pipeline progresses:
| Stage | Running label | Success label | Failed label |
|---|---|---|---|
crop_and_detect_faces | Scanning document | Document scanned | Scanning failed |
pre_validate_and_extract | Extracting data | Extraction complete | Extraction failed |
visual_integrity_check | Extracting data | Visual integrity checks complete | Extraction failed |
post_validation | Running post-validation | Post-validation complete | Post-validation failed |
voter_registry | Checking voter registry | Voter registry check complete | Voter registry check failed |
Use status to determine overall success/failure and statusLabel for user-facing progress messages.
Retrieving Results
Single Instance
Code
Returns the full instance including validation results, extracted schema fields, and signed file URLs (valid for 5 minutes).
Stage-Level Detail
Code
Returns the raw output of a single pipeline stage. Useful for debugging. Available stage IDs:
crop_and_detect_facespre_validate_and_extractvisual_integrity_checkpost_validationvoter_registry(SRI_LANKA_ID only)
List Instances
Code
Supports filtering by processorId, status, and pagination via limit / offset. Add includeFileUrls=true to include signed file URLs in the response.
Result Structure
When processing completes, the instance result contains:
validation— Overall validation status (success,success_with_warnings,failed), rule-by-rule results, and user-facing error messagesextraction— Extraction status and rule evaluationsschema— Extracted structured data as key-value pairs, where each value includes:value— The normalized extracted valueoriginalText— The raw text as it appears on the documentoriginalTextConfidence— Confidence score (0–1)isOriginalTextHandwritten— Whether the text was handwritten
postProcessing— Post-processing rule results and any schema modifications (derived, corrected, or computed values)stages— Per-stage results with status and stage-specific data
LK Identity Verification (SRI_LANKA_ID)
The SRI_LANKA_ID processor is the most comprehensive document processor, purpose-built for Sri Lankan KYC workflows. It supports NIC (old 9-digit and new 12-digit formats), passports, and driver's licenses.
What makes it different
- Visual integrity checks — AI-powered visual matching confirms the photo belongs to the document, and tampering detection flags signs of digital manipulation (splicing, overlay edits, resolution inconsistencies)
- Voter registration enrichment — When a NIC number is successfully extracted, the pipeline queries the Sri Lanka Elections eServices to fetch voter registration data and cross-validates the extracted name and gender across Sinhala, Tamil, and English
- Multi-language name validation — Uses a lightweight text LLM to compare extracted names against voter registry records across all three official languages
- Face detection — Detects and extracts face images from identity documents for downstream use
Recommended input data
For best results, provide inputData with known values so the AI can cross-check extracted fields:
Code
Mismatches between inputData and extracted values are flagged in the validation results — name, dateOfBirth, and idNumber mismatches produce errors, while address mismatches produce warnings.
Rule error codes
Every rule in the pipeline has a stable errorCode that appears in rulesFailed[].errorCode when that rule fails. You can retrieve the full rule list at runtime via GET /document-processing/processors.
Pre-validation & extraction rules (pre_validate_and_extract stage)
These rules apply to all processors. The AI evaluates them on every submission.
errorCode | Level | Rule | What it means |
|---|---|---|---|
DOCUMENT_NOT_PROVIDED | error | document-provided | No document image was found in the input |
POOR_IMAGE_QUALITY | error | acceptable-documents | Image is blurry, glared, or partially obscured — text is not reliably readable |
UNSUPPORTED_DOCUMENT_TYPE | error | supported-document | Document does not match any recognised Sri Lankan ID template (NIC, passport, driver's license) |
MISSING_DOCUMENT_PAGES | error | all-pages-provided | A multi-page document is incomplete (e.g. only one side of a NIC was submitted) |
FIELDS_NOT_LEGIBLE | error | all-fields-legible | One or more required fields are obscured, cut off, or unreadable |
REQUIRED_FIELDS_MISSING | error | data-fields-present | Expected fields for this document type are absent or redacted |
PAGES_TEMPLATE_MISMATCH | error | all-pages-match-one-template | Pages submitted belong to different document types |
DOCUMENT_EXPIRED | warning | document-expired | The document's expiry date field is present and has passed today's date — extraction still runs, but the result is flagged |
Extraction cross-check rules (only fired when inputData is provided)
errorCode | Level | Triggered when |
|---|---|---|
NAME_MISMATCH | error | inputData.name does not match the name on the document |
DOB_MISMATCH | error | inputData.dateOfBirth does not match the date of birth on the document |
ID_NUMBER_MISMATCH | error | inputData.idNumber does not match the NIC or passport number on the document |
ADDRESS_MISMATCH | warning | inputData.address does not match the address on the document (warning only — does not fail the instance) |
Visual integrity rules (visual_integrity_check stage)
errorCode | Level | What it means |
|---|---|---|
DOCUMENT_TAMPERING_DETECTED | error | AI detected signs of digital manipulation — pixel inconsistencies, overlaid text boxes, or damaged security features |
VISUAL_MATCH_FAILED | info | Document layout does not visually match the expected template (informational only on SRI_LANKA_ID) |
Post-processing rules (post_validation stage)
These are programmatic checks derived from the NIC number itself — no AI involved. They run after extraction completes and only apply when a NIC number was extracted.
errorCode | Level | What it means |
|---|---|---|
NIC_FORMAT_INVALID | error | Extracted NIC does not match the old format (9 digits + V/X) or the new format (12 digits) |
NIC_DOB_MISMATCH | error | Date of birth decoded from the NIC digits does not match the date of birth extracted from the document face |
NIC_SEX_MISMATCH | warning | Sex decoded from the NIC digits does not match the sex extracted from the document (warning only) |
NIC_DOB_EXTRACTION_FAILED | error | NIC has valid format but the day-of-year field encodes an impossible date |
NIC_NUMBER_MISSING | error | A NIC document was identified but no NIC number could be extracted from it |
Handling rule errors in your client:
Code
Switch on rulesFailed[].errorCode for structured handling, or display userErrorMessages directly to end users.
Error Codes
All errors include a machine-readable errorCode field. There are three categories:
Request Errors (HTTP 4xx)
These indicate a problem with the request itself. Fix the request and retry — no credits are charged.
| Code | HTTP | Description | What to do |
|---|---|---|---|
UNKNOWN_PROCESSOR | 400 | The processorId does not match any registered processor | Check the processorId against the Available Processors table |
NO_FILES_PROVIDED | 400 | The files array was empty or missing | Include at least one file in the request |
UNSUPPORTED_FILE_TYPE | 400 | One or more files have a MIME type not accepted by the selected processor | Check the accepted formats for your processor and convert the file accordingly |
INSTANCE_NOT_FOUND | 404 | No processing instance exists with the given ID | Verify the instanceId is correct and belongs to your organization |
STAGE_NOT_FOUND | 404 | The requested stage ID does not exist on the instance | Check the stage IDs listed under Stage-Level Detail |
Example error response:
Code
Document Failures (instance.status = "failed")
These mean the document itself could not be processed successfully. The pipeline ran, but the document did not pass. The errorCode appears on the instance and on the failed stage.
| Code | Description | What to do |
|---|---|---|
PRE_VALIDATION_FAILED | The document failed pre-validation — wrong document type, poor image quality, or it didn't pass the document type rules | Ask the end user to resubmit with a clearer image of the correct document type |
EXTRACTION_FAILED | Required fields could not be extracted from the document (e.g. text was unreadable, fields were obscured) | Ask the end user to resubmit with a better quality image — ensure the document is fully visible and in focus |
POST_VALIDATION_FAILED | Extracted data failed cross-validation checks (e.g. NIC date of birth doesn't match extracted date of birth, name mismatch) | Review the result.validation.userErrorMessages and result.postProcessing.ruleResults for specifics. This may indicate a fraudulent or inconsistent document |
Check result.validation.userErrorMessages and result.postProcessing.ruleResults for human-readable details to present to the end user.
Example failed instance:
Code
Infrastructure Errors (instance.status = "error")
These indicate an internal failure — the pipeline could not run due to a service issue, not a problem with the document.
| Code | Description | What to do |
|---|---|---|
INTERNAL_SERVICE_FAILURE | An internal service failure prevented processing (e.g. LLM timeout, vision service unavailable) | Retry the request. If the issue persists, contact support |
The error field on the instance will contain a generic message. Internal service details (hostnames, URLs) are never exposed in the API response.
Deleting an Instance
Code
Permanently removes the processing instance and all associated data.
Webhook Events
Subscribe to these events to get real-time notifications:
| Event | When it fires |
|---|---|
documentVerification.instance.created | A new processing instance has been created |
documentVerification.instance.updated | The instance status or result has changed |
documentVerification.instance.deleted | An instance was deleted |
See the Webhooks guide for setup and signature verification details.

