Free tool to extract data from PDFs: Tabula
PDFs are handy for displaying articles and books in a well-designed format. But for data analysis? Not so much. Yet there are times where data you’d like to analyze is only available in a table within a PDF — especially frustrating since odds are, that data began in a much friendlier database or spreadsheet format.
Enter Tabula, a free, open-source tool designed for “liberating data tables locked inside PDF files.” It was created by several journalists with the support of a number of organizations including Knight-Mozilla OpenNews, the New York Times and La Nación DATA.
To use, download the software from the project website . It runs locally in your browser and requires a Java Runtime Environment compatible with Java 6 or 7. Import a PDF and then select the area of a table you want to turn into usable data. You’ll have the option of downloading as a comma- or tab-separated file as well as copying it to your clipboard.
To read this article in full or to leave a comment, please click here
leave a reply: