0

I am converting pdf to text using poppler-utils and the pdftotext-function on Ubuntu. Unfortunately I keep running into a problem where some files are not converted decently.

A correctly converted file looks like this:

  82 => '23:00 23:00 - 05:00 05:00 01:30',
  83 => 'Page 1 of 5',
  84 => 'Generated on Feb 05, 2023 17:11',

But some files result in something like this:

  82 => 'WĂƌƚŝĂůK&&;ĞŶĐƌŽĂĐŚĞĚďLJ',
  83 => 'ĚƵƚLJͿ',
  84 => 'ϬϬ͗ϭϯͲϮϯ͗ϱϵ D',

Both documents are pdf version 1.4 and appear to have been encoded with the same software, so I'm at a loss, what is causing this problem.

Does anyone have a suggestion, what to try next?

lowflyer7
  • 3
  • 1
  • Thanks, that helped. The reading is garbled too and I am not able to see any logic. – lowflyer7 Apr 17 '23 at 13:18
  • 1
    If I acquire the file through another device/browser combination, it works fine. I suspect it has something to do with the browser version/configuration?! – lowflyer7 Apr 17 '23 at 13:37

0 Answers0