I have a list of tokenised sentences, for example :
text = ['Selegiline',
'-',
'induced',
'postural',
'hypotension',
'in',
'Parkinson',
"'",
's',
'disease',
':',
'a',
'longitudinal',
'study',
'on',
'the',
'effects',
'of',
'drug',
'withdrawal',
'.']
I want to convert this list into a string, but when punctuation such as -
or :
appear, I want to remove the extra space, so the final output would look something like this:
Selegiline-induced postural hypotension in Parkinson's disease: a longitudinal study on the effects of drug withdrawal
I tried splitting the list into equal chunks and checking if pair of two objects are words then using a single space; otherwise, no space:
def chunks(xs, n):
n = max(1, n)
return (xs[i:i+n] for i in range(0, len(xs), n))
data_first = list(chunks(text, 2))
def check(data):
second_order = []
for words in data:
if all(c.isalpha() for c in words[0]) and all(c.isalpha() for c in words[1]):
second_order.append(" ".join(words))
else:
second_order.append("".join(words))
return second_order
check(data_first)
But I have to iterate it until the last word (recursive solution). Is there a better way to do this?