1

I have this document: http://isc.wcdn.co.il/w9/v/news/files/1391927813_6747.pdf and I wish to export all the data into a list in my MySQL database, the problem is no matter how I try to export it it gives me gibberish, I've tried converting it with several tools and none worked properly.

Is there a way I can export the data into my MySQL database with the correct encoding?

Thank you

Tomer Gal
  • 933
  • 12
  • 21
  • This is (in my experience) surprisingly difficult - but it can be done. – Strawberry Feb 10 '14 at 00:17
  • Yes, unfortunately it is. Do you know any way of doing this? – Tomer Gal Feb 10 '14 at 00:25
  • The best success that I've had has been with xpdf's pdftotext (using the shell_exec command) - but my files were particularly complicated so you might have an easier time. – Strawberry Feb 10 '14 at 00:35
  • If your pdf files aren't too complex, you might look at this q and a http://stackoverflow.com/questions/6999889/how-to-extract-text-from-the-pdf-document – O. Jones Feb 10 '14 at 00:43
  • 1
    The problem (I think) is that your file is poorly constructed. This is something to do with the software used to create the PDF - but it's too technical for me to understand. If I try to parse your file using pdftotext, I get gibberish but if I reprint it through Acrobat and then 'OCR' that, and then parse that!... I get the correct text. – Strawberry Feb 10 '14 at 00:49
  • Wow Strawberry! Can you please somehow send me the correct text? – Tomer Gal Feb 10 '14 at 00:51
  • Sure - there will be some mistakes because OCR isn't perfect, but it should be pretty good. I'm not sure what the protocol is for PMing in SO! – Strawberry Feb 10 '14 at 00:54
  • Apparently there is no way to do this on SO, luckily I have an email address dedicated to these sort of things. Please email me at censored@gmail.com , Thank you so much! – Tomer Gal Feb 10 '14 at 00:58
  • That was only the first page for some reason - I've sent the raw text again with ALL the names – Strawberry Feb 10 '14 at 01:06
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/47182/discussion-between-tomer-gal-and-strawberry) – Tomer Gal Feb 10 '14 at 17:53

0 Answers0