Language examples

curl

Java

PHP

Node.js

JavaScript

Tabex APIs 

PDF-to-Excel-API

PDF-to-XML-API

PDF-to-CSV-API

PDF-to-TXT-API

PDF-OCR-API

PDF-to-HTML-API

PDF-SCRAPER-API

PDF-to-Image-API

PDF-CHARTS-API

Tabex PDF to Excel API and PDF to XML API  Try it now

Tabex PDF to Excel Api (alongside with the PDF to XML) is designed to offer both performances and flexibility. This is a restful API which offers high speed and precision. Developers can call the API in different modes and integrate it in a variety of work flows for semantic analysis, data capture, scrape data from PDF, automation in invoice processing, mortgage processing and account receivable processing.

Tabex PDF API can be used in lieu of some of the functions offered by Adobe PDF library and PDF box. Developers leveraging Tabex PDF API have also built applications in which Tabex works in tandem with other pdf libraries and pdf sdk packages. Some applications have also included the use of php PDF Library.

Tabex PDF API Data Capture and Extraction Modes

AUTO In the auto mode the API will automatically recognize tables within a document. This mode is adapt for fox lengthy documents with scattered tables throughout the document. It is also effective when you want to rapidly convert dozens of pages from pdf to excel and quickly identify all pdf tables within the document.

PAGE BOX Page box allows developer to point the Tabex PDF to excel API towards a particular box in the page. This mode offers high precision in recognizing and extracting pdf tables to excel each time that you can identify a recurring geometrical theme within a document. It is also essential if you want to build an interactive end user interface to extract PDF tables to excel, CSV and XML

PAGE WIDTH This mode allows developer to extract the entire pdf page as a table in one of the supported formats. Tabex PDF to Excel API in this case will return a page within a excel document and each sub tables still accurately recognized. This method is adapt for cases in which the developer is primarily interested in the numerical and textual data within the tables as opposed to determining the table layout accurately. It can be an essential tool when dealing with complex page formats that depart from the standard.

TEMPLATE If the company you are working for has certain repetitive forms or certain type of invoices that are more common than others, you can define a dynamic XML template as Tabex API input. Using this option you can achieve 100% data extraction precision and incredible versatility. Tabex API will apply the input template to the data extraction process.

SEMANTIC if you develop your own natural language processing algorithms Tabex API allows you and your team to leverage specific word meaning to extract selectively table components. You can selectively extract rows or columns that contain certain semantic values. You can also build advanced logic based on semantic such as “extract only tables containing the following terms” or “neglect tables containing the following terms…”.

OCR WEBSERVICE Tabex PDF API is equipped with a powerful and versatile text recognition technology. The Tabex OCR is invoked automatically fro the Tabex API, however developers can use the Tabex OCR as an independent OCR API to extract text in a variety of modes. Learn more about Tabex OCR Webservices APIs. 

  • XML
  • XSLSX
  • XLS
  • CSV
  • HTML
  • TXT
  • JSON*
  • JPG*
  • PNG** Send us an email for these formats.
  • Standard up to 20Mb and 1500 pages
  • Supported Scanned Documents
  • Inquire for special needs
  • All European languages, Arabic , Chinese, Korean
  • Detects page tilt automatically
  • Detects page rotation automatically
  • High processing speed
  • Up to 1000 pages per minute
  • High Accuracy

 

Learn more about API pricing

See all the pricing

Guidelines to use Tabex PDF API    Try it now

Tabex api is a simple multipart HTTP request with the content of the file to base urlhttp://api2.pdfextractoronline.com:8080/tab2ex/api?tab2exkey=

The parameter value  is an API key (token) obtained from Tabex Team

The HTTP request accepts the following parameters ( see Table) :

 

Parameter Name Required (Y/N) Range of Values Description
tab2exkey Y N/A Key that enables the use of api
pdfDownloadUrl N http/https url This option permits to parse documents directly from an url. In this case any other file in HTTP request will be ignored
fileName Y N/A Give a name to your file
recognitionMethod N – auto

– PageWidthIgnore

– PageWidth

PageWidthBox

This parameter permits to change the recognition table method:

– auto

– PageWidthIgnore (creates a first table having width = page width)

– PageWidth (creates a first table having width = page width)

PageWidthBox creates a first table having specific box coordinates. {(x1,y1);(x2,y2)}

outputFormat Y – CSV

– XLS

– HTML

– XML

Choose the desired output format of conversion
xlsExportFileType N – xls

– xlsx

In case of output format XLS you can choose both xls or xlsx file format
xlsExportType N – SINGLE_FILE_SINGLE_SHEET

– SINGLE_FILE_MULTIPLE_SHEET

In case of output format XLS(xlsx) you can choose a single file sheet or a multiple sheet
xlsDateFormatString N N/A Date format: default is MM/dd/yyyy
xlsDecimalSeparatorString N . or , In case of output format XLS  choose decimal seperator. Default is .
xlsThousandsSeparatorString N . or , In case of output format XLS  choose thousands separator. Default is ,
forceOCR N – true

– false

This parameter permits to force an ocr parsing process even the pdf document doesn’t need.
top N distance in cm from top margin. Mandatory for recognitionMethod=PageWidthBox
right N distance in cm from top right margin. Mandatory for recognitionMethod=PageWidthBox
bottom N distance in cm from top bottom margin. Mandatory for recognitionMethod=PageWidthBox
left N distance in cm from top left margin. Mandatory for recognitionMethod=PageWidthBox
ignoreGraphicLines N ignore structure of lined tables inside main table Mandatory for recognitionMethod=PageWidthBox
lang N -English, German, French, Spanish, Russian, Italian,

Portuguese, Dutch, Finnish, Catalan, Indonesian,

Swedish, Turkish, Romanian, Danish, Norwegian,

Polish, Hungarian, Estonian, Slovenian, Croatian,

Czech, Slovak, Lithuanian, Latvian, Bulgarian, Chinese_Simplified

Chinese_Traditional, Arabic, Korean, Japanese

If you invoke the use of OCR you can set a preferred language to increase the precision of the text recognition. See also Tabex OCR technology

MONITOR API CONSUMPTION/USAGE

To monitor the status of your conversions you can use the following call:

  • http://api2.pdfextractoronline.com:8080/tab2ex/balance?tab2exkey=XXXXXX

Receiving a json-file as output in the following form:

{"usage":m, "threshold": n}

where :

  • usage: give total number of your conversions completed without errors
  • threshold: is the limit of conversions.

If usage reaches the value of threshold the conversions won’t be permitted until you will buy a  recharge.

Example: CURL    Try it now

This is a typical http request using curl utility command line:

curl -v -s -o mygettext.xlsx -F file=@ATTInc.1-39.pdf "http://api2.pdfextractoronline.com:8080/tab2ex/api?tab2exkey=xxxxx&fileName=ATTInc.1-39.pdf&recognitionMethod=auto&outputFormat=XLS&xlsExportFileType=xlsx"

In the headers you can find information about the status of calling.

Example: JAVA    Try it now

This is a sample java class that implements http request:

package com.tabex.tab2ex.client;


import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Arrays;
import javax.ws.rs.client.Client;
import javax.ws.rs.client.ClientBuilder;
import javax.ws.rs.client.Entity;
import javax.ws.rs.client.Invocation;
import javax.ws.rs.client.WebTarget;
import javax.ws.rs.core.Response;
import org.apache.commons.io.IOUtils;
import org.springframework.core.io.FileSystemResource;
import org.springframework.core.io.Resource;
import org.springframework.http.HttpEntity;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpMethod;
import org.springframework.http.HttpStatus;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.util.LinkedMultiValueMap;
import org.springframework.util.MultiValueMap;
import org.springframework.web.client.RestTemplate;


public class Tab2exPublicApi {

    public static byte[] read(File file) throws IOException {

        ByteArrayOutputStream ous = null;
        InputStream ios = null;
        try {
            byte[] buffer = new byte[4096];
            ous = new ByteArrayOutputStream();
            ios = new FileInputStream(file);
            int read = 0;
            while ((read = ios.read(buffer)) != -1) {
                ous.write(buffer, 0, read);
            }
        } finally {
            try {
                if (ous != null) {
                    ous.close();
                }
            } catch (IOException e) {
            }

            try {
                if (ios != null) {
                    ios.close();
                }
            } catch (IOException e) {
            }
        }
        return ous.toByteArray();
    }

    public static void main(String[] args) throws IOException {
        
        String fileName = "ATTInc.1-39.pdf";
        String uri = "http://api2.pdfextractoronline.com:8080/tab2ex/api";
        String fileNameOut = "ATTInc.1-39.xlsx";
        byte[] inputBytes = Tab2exPublicApi.read(new File(fileName));
        MultiValueMap<String, Object> multipartMap = new LinkedMultiValueMap<String, Object>();
        Resource pdfFile = new FileSystemResource(fileName);
        multipartMap.add("tab2exkey", "ABCDEFG");
        multipartMap.add("fileName", "ATTInc.1-39");
        multipartMap.add("recognitionMethod", "auto");
        multipartMap.add("outputFormat", "XLS");
        multipartMap.add("xlsExportType", "xlsx");
        multipartMap.add("file", pdfFile);
        RestTemplate template = new RestTemplate();
        HttpHeaders headers = new HttpHeaders();
        headers.setContentType(new MediaType("multipart", "form-data"));
        HttpEntity<Object> request = new HttpEntity<Object>(multipartMap, headers);
        ResponseEntity<byte[]> httpResponse = template.exchange(uri, HttpMethod.POST, request, byte[].class);
        if (httpResponse.getStatusCode().equals(HttpStatus.OK)) {
            FileOutputStream output = new FileOutputStream(new File(fileNameOut));
            IOUtils.write(httpResponse.getBody(), output);
        }
        System.out.println(httpResponse.getHeaders().toString());
    }
}

In the headers you can find information about the status of calling.

Example: PHP     Try it now

This is a sample php snippet that implements http request:

<?php

    $ch = curl_init('http://api2.pdfextractoronline.com:8080/tab2ex/api');

    $fp = fopen("sample.xls", 'w+');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($ch, CURLOPT_FILE, $fp); 
    curl_setopt($ch, CURLOPT_POST,1);

    $data = array(
        'tab2exkey'    =>    "APY_key", 
        "fileName"    =>    "sample",
        "recognitionMethod"    =>    "auto",
        "outputFormat"    =>    "XLS",
        "xlsExportType"    =>    "xlsx",
        "file" => curl_file_create('sample.pdf')
    );

    curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
    curl_exec($ch);

?>

In the headers you can find information about the status of calling.

Example: Node.js     Try it now

This is a sample Node.js snippet that implements http request:

var express = require('express');
var fs = require('fs');
var request = require('request');
var router = express.Router();

router.get('/', function (req, res) {

    res.set('Content-Disposition', 'attachment; filename="sample.xls"');

    var formData = {
        tab2exkey: 'APY_key',
        fileName: 'sample',
        recognitionMethod: 'auto',
        outputFormat: 'XLS',
        xlsExportType: 'xlsx',
        file: fs.createReadStream('sample.pdf')
    };
    request.post({
        url: 'http://api2.pdfextractoronline.com:8080/tab2ex/api',
        formData: formData
    }).pipe(res);

});

module.exports = router;

In the headers you can find information about the status of calling.

Example: JavaScript     Try it now

This is a sample HTML snippet which renders a form to implements http request:

<form id=”tabex-form”>
<input id=”user-file” accept=”application/pdf” name=”file” type=”file” />
<input type=”submit” />
</form>

This is a sample JavaScript snippet which implements http request using the previous HTML form:

jQuery('#tabex-form').on('submit', function(event){
    event.preventDefault();
    var formData = new FormData();
    formData.append('tab2exkey', 'APY_key');
    formData.append('fileName', 'sample');
    formData.append('recognitionMethod', 'auto');
    formData.append('outputFormat', 'XLS');
    formData.append('xlsExportType', 'xlsx');
    formData.append('file', jQuery('#user-file').get(0).files[0]);
    
    var xhttp = new XMLHttpRequest();
    xhttp.onreadystatechange = function() {
        var a;
        if (xhttp.readyState === 4 && xhttp.status === 200) {
            a = document.createElement('a');
            a.href = window.URL.createObjectURL(xhttp.response);
            a.download = "sample.xls";
            a.style.display = 'none';
            document.body.appendChild(a);
            a.click();
        }
    };
    xhttp.open("POST", 'http://api2.pdfextractoronline.com:8080/tab2ex/api');
    xhttp.responseType = 'blob';
    xhttp.send(formData);
});

In the headers you can find information about the status of calling.

Tabex offers an API to convert pdf document and extract data directly from your applications. Contact us  instantly receive the API Key .

GET API KEY  NOW FREE

Fill the form below to instantly  receive the API Keys for a free Trial