Parse PDF file

Dear Forum,

Is there any library or way to parse PDF file with C++??

I have tried to use PoDoFo library but I never succeeded to install it, too many dependencies and I'm getting a lot of error during compilation with mingw.

Thank you for your time and help!!!

I have recently been looking into the possibility of using PDFs for the basis of a project. I have been looking around at libraries and keep coming back to Adobe PDF Library[1] but I have yet to ask what the pricing for this is.

As the project is an off the books (read work related but not work sanctioned) and I think that the Adobe library will not be cheap, I thought I would start with a book:
Developing with PDF: Dive Into the Portable Document
by Leonard Rosenthol[2]

Edit:
As you don't give much info about what you are doing, I'll just mention that there is also an Acrobat SDK that may be of interest (if Acrobat is something you use).

__________________________________
[1] http://www.datalogics.com/products/pdf/pdflibrary/
[2] http://shop.oreilly.com/product/0636920025269.do
[3] http://www.adobe.com/devnet/acrobat.html
Last edited on
You're not the first person to wonder about this...

Googling "recommend pdf library c++" the first hit I get is:

Open source PDF library for C/C++ application? [closed]
http://stackoverflow.com/questions/58730/open-source-pdf-library-for-c-c-application

which mentions PoDoFo, but also LibHaru (more popular than PoDoFo), Hummus, ...

And

Open Source PDF Libraries and Tools
http://pdf-house.blogspot.co.uk/

mentions

QPDF
http://qpdf.sourceforge.net/

And

A list of open source C++ libraries
http://en.cppreference.com/w/cpp/links/libs

lists

HARU
PoDoFo
JagPDF

...

Andy
Finding a list of open libraries is the first step, finding a good one that does all you require is (unfortunately) quite different.
Thank you for your reply! I really appreciate your help

Well I want to create a GUI (vs2008 + Qt) that can read a PDF line by line and detect the presence of a checkboxes in the PDF. Store the state of those checkbox (true or false) in a data base. I proceeded as follow:
under vs2008 and with CMake.exe (I dropped MinGW)
1- Built zlib library
2- Built freetype library
3- Build jpeg library
4- Built png library
5- Built PoDoFo (set the appropriate path (debug and release) of the libraries)

When it comes to build the PoDoFo library, I got these errors:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
5
5>Linking...
3>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Done_FreeType referenced in function "public: __cdecl PoDoFo::PdfFontCache::~PdfFontCache(void)" (??1PdfFontCache@PoDoFo@@QEAA@XZ)
5>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Done_FreeType referenced in function "public: __cdecl PoDoFo::PdfFontCache::~PdfFontCache(void)" (??1PdfFontCache@PoDoFo@@QEAA@XZ)
2>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Done_FreeType referenced in function "public: __cdecl PoDoFo::PdfFontCache::~PdfFontCache(void)" (??1PdfFontCache@PoDoFo@@QEAA@XZ)
4>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Done_FreeType referenced in function "public: __cdecl PoDoFo::PdfFontCache::~PdfFontCache(void)" (??1PdfFontCache@PoDoFo@@QEAA@XZ)
5>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Init_FreeType referenced in function "protected: void __cdecl PoDoFo::PdfFontCache::Init(void)" (?Init@PdfFontCache@PoDoFo@@IEAAXXZ)
3>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Init_FreeType referenced in function "protected: void __cdecl PoDoFo::PdfFontCache::Init(void)" (?Init@PdfFontCache@PoDoFo@@IEAAXXZ)
2>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Init_FreeType referenced in function "protected: void __cdecl PoDoFo::PdfFontCache::Init(void)" (?Init@PdfFontCache@PoDoFo@@IEAAXXZ)
4>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Init_FreeType referenced in function "protected: void __cdecl PoDoFo::PdfFontCache::Init(void)" (?Init@PdfFontCache@PoDoFo@@IEAAXXZ)
5>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Get_Postscript_Name referenced in function "public: class PoDoFo::PdfFont * __cdecl PoDoFo::PdfFontCache::GetFont(struct FT_FaceRec_ *,bool,bool,class PoDoFo::PdfEncoding const * const)" (?GetFont@PdfFontCache@PoDoFo@@QEAAPEAVPdfFont@2@PEAUFT_FaceRec_@@_N1QEBVPdfEncoding@2@@Z)
4>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Get_Postscript_Name referenced in function "public: class PoDoFo::PdfFont * __cdecl PoDoFo::PdfFontCache::GetFont(struct FT_FaceRec_ *,bool,bool,class PoDoFo::PdfEncoding const * const)" (?GetFont@PdfFontCache@PoDoFo@@QEAAPEAVPdfFont@2@PEAUFT_FaceRec_@@_N1QEBVPdfEncoding@2@@Z)
2>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Get_Postscript_Name referenced in function "public: class PoDoFo::PdfFont * __cdecl PoDoFo::PdfFontCache::GetFont(struct FT_FaceRec_ *,bool,bool,class PoDoFo::PdfEncoding const * const)" (?GetFont@PdfFontCache@PoDoFo@@QEAAPEAVPdfFont@2@PEAUFT_FaceRec_@@_N1QEBVPdfEncoding@2@@Z)
3>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Get_Postscript_Name referenced in function "public: class PoDoFo::PdfFont * __cdecl PoDoFo::PdfFontCache::GetFont(struct FT_FaceRec_ *,bool,bool,class PoDoFo::PdfEncoding const * const)" (?GetFont@PdfFontCache@PoDoFo@@QEAAPEAVPdfFont@2@PEAUFT_FaceRec_@@_N1QEBVPdfEncoding@2@@Z)
5>podofo.lib(PdfFontMetricsFreetype.obj) : error LNK2001: unresolved external symbol FT_Get_Postscript_Name
4>podofo.lib(PdfFontMetricsFreetype.obj) : error LNK2001: unresolved external symbol FT_Get_Postscript_Name
2>podofo.lib(PdfFontMetricsFreetype.obj) : error LNK2001: unresolved external symbol FT_Get_Postscript_Name

Well, those three functions are all supposed to come from the FreeType library.

You mention you'd set the path for the libraries, so that's covered.

Open a Visual Studio command prompt and cd to the directory where the FreeType .lib file is found and

link /dump /linkermember:1 freetypeXXX.lib

(where XXX will be the version numbers.)

and check the function names look the same.

When building with MSVC I would expect names like

_FT_Init_FreeType
_FT_Done_FreeType
_FT_Get_Postscript_Name

i.e. with a leading underscore.

The names without the underscore look more like I would expect from GCC.

Andy
Last edited on
btw, just curious, why you want to parse pdf files? is it for encryption?
chipp wrote:
btw, just curious, why you want to parse pdf files?
Massi wrote:
...that can read a PDF line by line and detect the presence of a checkboxes in the PDF. Store the state of those checkbox (true or false) in a data base.
Hello chipp, I'm sorry for the late response, I have to create a GUI that can read a PDF file line by line and get the status of checkboxes (checked or not) .
Registered users can post here. Sign in or register to post.