I am trying to convert some pdfs into text using pdftotext and the conversion is happening but some words are getting squashed together. For example, the 2nd day
becomes the2nd day
, before me
becomes beforeme
and so on. Why does this happen and how should I get rid of these discrepancies?
I have tried using okular(since I use linux) to convert pdf to text but that also gives me the same kind of output. And this is bothering because it hinders text extraction a lot.