Robust PDF parsing

I’ve ported Didier Steven’s script to C++. The problem I have is that the parser doesn’t handle malformed but still loadable by Adobe Reader X pdf files. I found a collection here: – some of the files there no longer load in Reader X it appears though.


If anyone knows of some open source PDF parser that will handle these documents, please inform me. I would like to see how they perform the parsing. So far sumatrapdf and pdfminer do not handle these documents.


(You can also get a lot of PE tricks here:

You can find Didier’s original code here:


~ by ra1ndog on November 14, 2011.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: