I have a document with some lines that have spaced out letters which I want to remove.
The problem is, that the strings are not following all the same rules. So I have some with just one space, also between the words and some with two or three speaces between the words
Examples:
"H e l l o g u y s"
"H e l l o g u y s"
"H e l l o g u y s"
all the above should be converted to --> "Hello guys"
"T h i s i s P a g e 1" --> "This is Page 1"
I wrote a script to remove every second space but not if next letter is numeric or capital. It's working almost OK, since the processed text is German and almost every time the words begin with capital letters... almost. Anyways I'm not satisfied with it. So I'm asking if there is a neat function for my problem.
text = text.strip() # remove spaces from start and end
out = text
if text.count(' ') >= (len(text)/2)-1:
out = ''
idx = 0
for c in text:
if c != ' ' or re.match('[0-9]|\s|[A-Z0-9ÄÜÖ§€]', text[idx+1]) or (idx > 0 and text[idx-1] == '-'):
out += c
idx += 1
text = out