I have a PDF file that contains many tabular form information like
1 john maklin testing 20000
I want convert this PDF file data into an Excel file.
I have a PDF file that contains many tabular form information like
1 john maklin testing 20000
I want convert this PDF file data into an Excel file.
If it is a one time thing then I would recommend using a software which is already available in the market. I would recommend seeing this link here in SO where I answered a similar query.
If it is a regular thing then you can try and integrate the Xpdf in your code to create one such application. Though I am sure it will be pretty messy :)
I could have suggested installing Adobe Professional and then using it in your program but I have already traversed that route and I am sure you wouldn't wanna do the same. Adobe has published the specification for PDF but it seems like each program has somewhat different interpretation of the pdf which makes it difficult for one to read any pdf file considering the fact that every vendor follows different guidelines to create the pdf.
If you search the web you will come across this link. They claim to have successfully integrated Xpdf with VBA. I have never tested it so you might want to check it out yourself. If it can be integrated in VBA then there is a high possibility that it can be integrated in VB.Net as well. I am not sure about python though.
Other alternative is if you can output the file to csv directly instead of pdf at the source then the conversion of pdf to Excel wouldn't arise. I am not sure if you can exercise this option.