0

I am having a PDF which has rows in below format :

Category : Demo

Name : abc

Occupation :xyz

Address : abc ,xyz

Category : Demo

Name : 123

Occupation :456

Address : abcd 

and this data is repeated in two column format.

Is there any way to import PDF data to SQL Server ?

I have converted PDF to Excel but it does not gives proper column structure.

How can I import data from PDF to SQL Server ? Or How can I do it in C#.Net ?

ekostadinov
  • 6,880
  • 3
  • 29
  • 47
sam
  • 3
  • 2
  • 4
  • What have you tried? There are plenty of online resources that explain the data that can be read via pdf and then store in database. – Nagaraj Tantri Sep 15 '14 at 06:54
  • You have two different tasks here a) reading the data from PDF b) writting data to SQL Server. Search each one of them separately. – SJuan76 Sep 15 '14 at 06:59
  • You can check this http://stackoverflow.com/questions/2579373/saving-any-file-to-in-the-database-just-convert-it-to-a-byte-array/2579467#2579467 – V2Solutions - MS Team Sep 15 '14 at 07:01
  • PDF is a format optimized for *reproducible display* of document content, not for *information extraction* from that document. PDF offers mechanisms to make information extraction easier, too, but the use of these mechanisms is optional. Thus, reliable information extraction in geenral is only possible using custom programs for the PDF type at hand. If at all, that is. – mkl Sep 15 '14 at 10:03

2 Answers2

1

There is not a proper way to do this. You must develop your own solution for the pdf file and its layout/format. There are several APIs to read PDF content, but I suggest you to use PDFlib TET because it can extract table layout from pdf. If the extracted table does not fit your needs, you can do it yourself while using coordinate based extraction.

Sercan
  • 327
  • 1
  • 7
0

best way for that is exporting pdf file to excel, then use lots of different application that let you import excel file into sql. as i'm using mac RAZOR SQL is a nice application for that.

Farhad
  • 33
  • 9