0

I need to extect text from pdf with custom fonts but custom don't let to copy/paste text or search text or extract text in a clear/readble way by iText lib... the resultant text is space or non uman readable chars

The pdf format are: Author: User Creator: Compart Docponent API Producer: Compart MFFPDF I/O Filter 2013-03-09 00:51:11 CreationDate: 04/21/16 11:26:59 ModDate: 06/09/16 10:02:16 Tagged: no Form: none Pages: 6 Encrypted: no Page size: 595.2 x 841.92 pts (A4) (rotated 0 degrees) File size: 312703 bytes Optimized: yes PDF version: 1.4

the pdf fonts info are (running pdffonts command line for each fonts): name:[none] ; type:[Type 3] ; emb: [yes]; sub: [no]; uni : [yes];

so the pdf seems to have a ToUnicode map but that is not enough..

How I can read text in a clear way?

thanks in advance

G.G.

Guton
  • 27
  • 1
  • 5
  • Is the text extractable with Acrobat? If it is then post the PDF. If not, there's no hope. – Paulo Soares Jun 10 '16 at 18:13
  • Hi Paulo Soares, thanks for the reply.. no is not extractable with Acrobat ..you can download the file here https://drive.google.com/file/d/0B0f6X4SAMh2KRDJTbm4tb3E1a1U/view – Guton Jun 10 '16 at 19:15
  • Possible duplicate of [Extract text with iText not works: encoding or crypted text?](http://stackoverflow.com/questions/37748346/extract-text-with-itext-not-works-encoding-or-crypted-text) - The sample file is identical. – mkl Aug 08 '16 at 14:45

0 Answers0