I want to join multiple sets of characters iteratively in a string. Example:
mystr = 'T h i s _ i s _ a _ s e n t e n c e'
joins = [('e', 'n'), ('en', 't'), ('i', 's'), ('h', 'is')]
# do multiple replace
for bigram in joins:
mystr = mystr.replace(' '.join(bigram), ''.join(bigram))
print(mystr)
'T his _ is _ a _ s ent en c e'
In the first iteration it joins e n
into en
, then en t
into ent
and so on. It's important that the joins are done in order, since the join ('en', 't') doens't work unless ('e', 'n') has been joined.
With a string of 20MB and 10k joins, this takes a while. I'm looking to optimize this, but I don't know how. Some of the things I've discarded:
- I didn't use regex like in this question because I don't know how to do
re.sub
where the substitution is the match itself but joined together - I didn't use
str.translate
like this question either because as far as I know, translate can only translate single characters, and in myjoins
there are multiple
Is there any algorithm, string or regex or any other function that would allow me to do this? Thank you!