3

I need to extract data from .PDF files and load it in to SQL 2008. Can any one tell me how to proceed??

This is how the data looks

S..
  • 1,242
  • 8
  • 29
  • 49
  • 1
    **Step one: Search.** This question gets asked literally 3 times a day here. And *inevitably* someone answers with "use iTextSharp". I'm too tired of it to even do that anymore. – Cody Gray - on strike Feb 07 '11 at 15:28
  • @Cody Gray .....I did but but i was not able 2 solve the issue as u said i am also tired of the answer use iTextSharp......which did not help me :( – S.. Feb 07 '11 at 15:32
  • @ramesh - And why do you think asking the same question again will change the answer? If you have more issues, ask about those. – Oded Feb 07 '11 at 15:34
  • 1
    @ramesh: So, comment that on the answer to the other question. Then edit your original question to clarify *why*, specifically, iTextSharp didn't work for you. That will automatically bump the question back to the top of the queue so that others will see it again. That's not a good reason to open a duplicate question. And look what's happening. Someone *else* is replying to use iTextSharp, because you didn't explain *why* you couldn't use it. – Cody Gray - on strike Feb 07 '11 at 15:34
  • I tried ItextSharp but it did not work, basically almost all the functions in it are to create and edit a PDF doc but not to read the data..... – S.. Feb 09 '11 at 16:36

2 Answers2

2

You will need to use a PDF library such as iTextSharp to extract the data from the PDF.

At this point, you have the data and can insert it into a database.

Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • can you provide me a sample code?? – S.. Feb 07 '11 at 16:50
  • @ramesh - Whatever I could post will not work for your specific situation, since every PDF will have different structure. I suggest downloading, installing and experimenting with iTextSharp. Find out how the PDF is structured and how to get to the different parts. – Oded Feb 07 '11 at 17:05
  • I tried ItextSharp but it did not work, basically almost all the functions in it are to create and edit a PDF doc but not to read the data..... – S.. Feb 09 '11 at 16:35
0

Text extraction works good with iText until you don't have a requirement to extract text from columns instead of rows (like Adobe Reader and Foxit Reader do when you copy the text from a PDF document. To extract text column by column the tool need to calculate a position and coordinates for text on a page

The commercial tool ByteScout PDF Extractor SDK capable of doing such text extraction with both row by row and column by column modes for text extraction (or can simply extract data as the structured XML)

DISCLAIMER: I work for ByteScout currently

Eugene
  • 2,820
  • 19
  • 24