I'm dealing with a list of strings that may contain some additional letters to its original spelling, for example:
words = ['whyyyyyy', 'heyyyy', 'alrighttttt', 'cool', 'mmmmonday']
I want to pre-process these strings so that they are spelt correctly, to retrieve a new list:
cleaned_words = ['why', 'hey', 'alright', 'cool', 'monday']
The length of the sequence of the duplicated letter can vary, however, obviously cool
should maintain its spelling.
I'm unaware of any python libraries that do this, and I'd preferably like to try and avoid hard coding it.
I've tried this: http://norvig.com/spell-correct.html but the more words you put in the text file, it seems there's more chance of it suggesting the incorrect spelling, so it's never actually getting it right, even without the removed additional letters. For example, eel
becomes teel
...
Thanks in advance.