Do you sometimes look at those annual reports and financial statements and think ‘I wish they’d provided that in Excel’?

PDF2XLMany companies do exactly this, of course: provide a link to download the various tables in Excel format, so that you don’t have to reproduce the tables yourself to carry out analysis work. See International Power for example, who provide individual Excel links for each summary financial statement; and Man Group who provide this data collated in an Analyst Workbook in Excel format for download. (International Power also provide an Analyst Pack). Sometimes these links are in the online annual report; sometimes on the main website. But they’re always very useful to those who want to be hands-on the data.

But not all companies provide this small but useful service to their visitors.

So what do you do? Surely not transcribe the data to recreate the spreadsheets…

How to extract data for analysis from PDF

I recently became aware of Cogniview, who provide software to extract tables from PDF to Excel, and they kindly provided me with a version of PDF2XL to try out and review for you.

So – over to the International Power cash flow statement in the Annual Report to experiment. They do provide an Excel download in their Annual Report as well, but I thought I’d try out PDF2XL on the PDF version.

PDF2XL was remarkably easy to use — very intuitive — as simple as highlighting the table you want to convert, and clicking a button. You can then, of course, tinker with the results to ensure that the table is formatted correctly, before extracting. This is very easy to manipulate, and you can see the effect of the changes you’re making: the PDF opens in the top half of the window, and the bottom half is dedicated to previewing what you’ll get in the extract.

And once the data is extracted, whether to Excel, Word or PowerPoint (I tested all three) it is easily editable within those packages. You’re not getting an image, but the real data for manipulation and analysis, as if you’d typed it in yourself. And if the PDF has multiple pages with the same table layout, you can do them all at once, by setting up a template.

I tested extracting data from a PDF, which is the main intent of the software, and it went very smoothly. If you wanted to use Excel to analyse data given to you in PDF form, this would work well for you.

I then experimented with extracting data from a web page. This is a feature of the Enterprise version, not the Standard version, but I thought that if you were reading an online annual report, this might be what you wanted to do… Of course, the annual report is almost always available in download form as a PDF, so you wouldn’t need the Enterprise version to do this.

Again, this was very easy. All I had to do was to open PDF2XL, then ‘print’ the web page to the Cogniview printer, and it appeared in the main PDF2XL window, ready for me to select the data I was interested in.

I did have a small problem with this, in that when I extracted data while using Firefox 3, some of the words ran together (so that, for example, the Excel table ended up with ‘Taxexpense’ instead of ‘Tax expense’). This isn’t too big an issue; the meat of the table transferred fine. I tested it again using Internet Explorer 8, and it worked perfectly.

Internet Explorer is still the most popular browser worldwide, particularly in major companies, so this isn’t likely to affect many people. Possibly only me, since Cogniview couldn’t reproduce it – so all I would need to do would be to run Internet Explorer to do this task if I wanted to avoid this issue. I don’t see it as a major problem, because the software was still coping very well in both Firefox and IE with this task…

And I was impressed by Cogniview’s excellent customer support, who responded very quickly and thoroughly to all my questions, despite the fact that I wasn’t a paying customer.

All” my questions? Yes, there was one other issue, which was that indented rows on the PDF lost their indentation when transferred to Excel, with the result that all rows appeared at the same level, left-aligned. Customer support tell me that this feature is scheduled for a future release. I see this as a nice-to-have feature, rather than a core piece of functionality, but would welcome its release sooner rather than later.

If you need this, you’ll know as soon as you see it working

I see a use for this in extracting data from corporate sites and reports for personal analysis, but other people are using it for a great many other purposes from reducing inventory update time through supporting academic research to gaining a competitive edge by analysing their competitors’ figures. Whatever your interest, if you’re retyping data from whatever source, of course, there’s always the risk of introducing errors, never mind the waste of time in redoing someone else’s work, so having this automated is bound to be more efficient.

The version I was using was the Enterprise version, which converts data from PDF and from application screens and reports to Excel, Word, or Powerpoint (or to CSV). Cogniview also provide a standard version, which extracts from PDF to Excel (and other outputs), but not from screens, and an OCR version which extracts from scanned PDFs as well as native PDFs (including a workflow to help in fixing the dodgy data you sometimes get with poor quality scans).

Why not trial it for free? You get a week, and a limited number of conversions, but if it’s going to be useful to you, you’ll know immediately – and if you do a lot of transcribing of data into Excel, it’s going to be very useful indeed.

And a bonus thought…

Since websites are my thing, here’s a comment on their site, not just on the tool. Just out of interest, have a look at their productivity calculator: for example, if you or your team transcribe only 3 PDF pages a week at $25/hour, your ROI could be 33% in the first month. Now, that is an excellent sales tool – the kind of thing that could be useful to other companies on their sites too – but it is buried behind the ‘buy now’ button. I’d suggest bringing it forward, even linking to it from the product pages.

I am very happy to see software company’s like Adobe and Microsoft beginning to realize that it is no longer a business advantage for them to have ‘closed’ software platforms.

I hope I win the application, I love trying out new software for my readers.

Vicky H

how about crystal reports by SAP, maybe cogniview can be an add-on, as crystal cant process pdf

