The solution below was provided on Stack Overflow here: expanding-english-language-contractions-in-python
It works great for contractions. I tried to extend it to handle slang words but ran into an issue per below. Also, I'd prefer to use 1 solution to handle all word conversions (e.g.: expansions, slang, etc.)
I extended the contractions_dict to also correct slang, see 3rd entry below:
contractions_dict = {
"didn't": "did not",
"don't": "do not",
"ur": "you are"
}
However, when I do so on words that include a slang term (ur) like "surprise" I get
"syou areprise"
The "you" and "are" embedded above are where the "ru" use to be.
How do you get an exact match on a key in the contractions_dict?
In my code below I tried embedding a more exact word match regex around the "replace" function but received an error "TypeError: must be str, not function".
The code:
import re
contractions_dict = {
"didn't": "did not",
"don't": "do not",
"ur": "you are"
}
contractions_re = re.compile('(%s)' % '|'.join(contractions_dict.keys()))
def expand_contractions(s, contractions_dict=contractions_dict):
def replace(match):
return contractions_dict[match.group(0)]
return contractions_re.sub(replace, s)
result = expand_contractions("surprise")
print(result)
# The result is "syou areprise".
# ---
# Try to fix it below with a word match regex around the replace function call.
contractions_re = re.compile('(%s)' % '|'.join(contractions_dict.keys()))
def expand_contractions(s, contractions_dict=contractions_dict):
def replace(match):
return contractions_dict[match.group(0)]
return contractions_re.sub(r'(?:\W|^)'+replace+'(?!\w)', s)
# On the line above I get "TypeError: must be str, not function"
result = expand_contractions("surprise")
print(result)