Skip to main content

FAQs❓

Frequently asked questions about DocuPipe

Updated this week

What are credits? How much will this cost me?

In DocuPipe, all operations cost credits at a fixed rate. The price of purchasing credits varies with your subscription tier, with the cheaper tiers costing more per-credit than the expensive tiers.

Most users fall into one of the usage pattern below. There are other services in DocuPipe, but these two are the bread and butter:

  1. Parsing Only: if all you want is the plain-text representation of your document, which is the result of uploading a document to DocuPipe - then the cost is 1 credit per page.

  2. Parsing and Standardization: this is the most popular option, where you upload your document and then standardize it according to your schema. Standardization costs an extra 2 credits per page on top of parsing, bringing the total to 3 credits per page

Why do I need to Standardize?

You might be asking - wait, what is this standardization thing? Why do I need that at all, isn't the parsed text output I get from DocuPipe after uploading enough? The answer is: sometimes, usually no. Why is that?

DocuPipe parsing uses traditional machine learning methods like transformer models for OCR, computer vision models for detection of objects like checkmarks and table extraction, etc. to deliver a semi structured output that organizes your documents into pages and sections. Each section can be text, table, or image - and we give you an ordered representation so that if you read the whole thing top to bottom, it will make sense.

However, this is not the same as the totally structured output you receive from standardization. For that, you define a schema with exactly the fields you want to extract, what they mean, and what types they are (date, integer, number, string, etc.). Standardization will give you the same output format every time, regardless of the input document. It is that consistency and predictability that makes it much more useful than just parsing on its own. Parsing by itself is mainly useful if you have a downstream use for that representation, for instance: putting it into an AI system yourself for further business needs.

What is the difference between Standardization and Analysis?

Standardization is the workhorse of DocuPipe, covering most use-cases. It is the appropriate service when one had many documents that all adhere to a common schema, meaning, you have the same fields that you want to extract from all such documents.

Analysis is more suitable when this specific document is its own unique snowflake, and you don't know in advance which questions you want to ask of it. For instance, imagine you have your own SaaS in which your users forward you a question about a document in free-text (you don't know their question in advance, it's not a repeated question that you always ask of such documents. In that case, analysis is a better solution, as it does not require a schema and let's you ask ad-hoc questions.

What is Classification? How does that fit in?

Classification is a triaging feature that is intended for users who do not necessarily know what type of document they are uploading in advance. For instance, they may be routing an incoming stream of faxes directly to DocuPipe, which can include multiple types of documents (e.g. Invoices, Utility Bills, Bank Statements). For such cases, the classification feature can determine which class the document belongs to, usually so it can be classified by the appropriate schema.

Since classification is priced much cheaper than standardization, it is economical to use it beforehand to determine if standardization is even needed (maybe this document does not fit the intended schema at all), or which schema to use. But for most users, who do know what they are uploading in advance, classification is unnecessary - because in essence you already know the class of the document that you uploaded.

What languages are supported?

DocuPipe's OCR engine supports comprehensive multilingual document processing, with validated performance across more than 100 languages. The system demonstrates exceptional accuracy with Latin-script languages and maintains strong performance across diverse writing systems including Cyrillic, Arabic, Asian scripts, and others. Our models are continuously trained on extensive multilingual datasets to ensure reliable extraction across global document types.


For AI-powered standardization and analysis, the system provides full functionality across all supported languages, capable of understanding and extracting structured data regardless of the source document's language. However, for optimal performance, we recommend providing extraction instructions and schema definitions in English, as this yields the most accurate and consistent results.

Performance considerations: While printed text across all supported languages generally performs well, OCR accuracy for handwritten content can be more variable, particularly for niche languages with complex scripts (e.g., handwritten Hebrew, Arabic, or Asian characters). For critical applications involving handwritten documents in these languages, we recommend testing with sample documents to ensure acceptable accuracy levels.

Validated high-performance languages include:

  • Latin Script: English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Czech, Slovak, Hungarian, Romanian, Croatian, Serbian, Slovenian, Estonian, Latvian, Lithuanian, Finnish, Swedish, Norwegian, Danish, Icelandic, Irish, Welsh, Catalan, Basque, Maltese, Turkish, Indonesian, Malay, Vietnamese, Filipino

  • Cyrillic Script: Russian, Ukrainian, Bulgarian, Serbian, Macedonian, Belarusian, Kazakh, Kyrgyz, Uzbek, Mongolian

  • Arabic Script: Arabic, Persian, Urdu, Pashto, Kurdish

  • Asian Scripts: Chinese (Simplified & Traditional), Japanese, Korean, Thai, Hindi, Bengali, Tamil, Telugu, Malayalam, Kannada, Marathi, Gujarati, Punjabi, Nepali, Sinhala, Burmese, Khmer, Lao

  • Other Scripts: Hebrew, Greek, Armenian, Georgian

The system also supports numerous additional languages with varying degrees of optimization. For specific language requirements not listed above, please contact our support team for validation and performance metrics.

What are workflows?

DocuPipe offers a variety of services: parsing, standardization, classification, split, review, and more. Chaining these services together with business logic can unlock powerful and flexible agentic AI workflows for complex document intelligence. In order to streamline and simplify, we offer the concept of a workflow to automatically trigger certain common use-cases upon the completion of a document upload (what we call parsing). Currently the following options are supported, but more will be added over time, or upon request:

  1. Standardize: upon upload, immediately standardize it with a list of schemas (usually a single one).

  2. Classify then Standardize: upon upload, immediately classify the document according to your taxonomy, and according to the classification results, conditionally standardize with a schema (you define a mapping of class: schema which tells us in which cases to apply which schema).

The advantage of using a workflow is that instead of first calling DocuPipe with the upload job, and only then executing downstream calls to standardize or classify, you can let DocuPipe handle the logic internally. This both reduces the handoff time between one job finishing and another starting, and minimizes the amount of code you need to write. We recommend using workflows whenever possible to keep things simple on your end, and let us handle the complexity.


Did this answer your question?