I am using pdfminer.six in Python to extract long text data. Unfortunately, the Miner does not always work very well, especially with paragraphs and text wrapping. For example I got the following output:
"2018Annual ReportInvesting for Growth and Market LeadershipOur CEO will provide you with all further details below."
--> "2018 Annual Report Investing for Growth and Market Leadership Our CEO will provide you with all further details below."
Now I would like to insert a space whenever a lowercase letter is followed by a capital letter and then a smaller letter (and for numbers). So that in the end "2018Annual"
becomes "2018 Annual"
and "ReportInvesting"
becomes "Report Investing"
, but "...CEO..."
remains "...CEO..."
.
I only found solutions to Split a string at uppercase letters and https://stackoverflow.com/a/3216204/14635557 but could not rewrite it. Unfortunately I am totally new in the field of Python.