How can I correct vowels that were replaced by a special character in a string?

Question

So I have a string, and within that string, certain characters in certain words are replaced with others (typo_text). For example: "USA, Germany, the European Commission, Japan, and Canada to fund the development and equitable rollout of the tests." This would be the correct format, but instead I'm given, "XSX, Gxrmxny, the European Commission, Jxpxn, and Cxnxdx to fund the development and equitable rollout of the tests, treatments and vaccines needed to end the acute phase of the COVID-19 pandemic. I have been creating a script that needs for loops to correct the typos:

def corrected_text(text):
    newstring=""
    for i in text:
        if i not in "aeiouAEIOU":
            newstring=newstring+i
    text=newstring
    return text

I know when I run this, it only removes all vowels from the text. However, it seems to be a step in right direction to help correct the typos and to get a feel for a for loop-based approach.

I have two lists of words that have this issue:

name_G7_countries = ['Canada', 'France', 'Germany', 'Italy', 'Japan', 'UK', 'USA']
mistake =  ['Cxnxdx', 'Frxncx', 'Gxrmxny', 'Xtxly', 'Jxpxn','XK', 'XSX']

I know using something like 'Jxpxn'.replace('x', 'a') may work; however, for other phrases, it may not, so I'm not sure how to proceed from here.

Are the `name_G7_countries` and `mistake` lists are given as a part of the problem? — Yuval.R, Feb 22 '22 at 20:43
There given as lists beforehand to know precisely what countries/phrases give the typos with their respective vowels being replaced by x. The typos are in the mitstake list. — Jake, Feb 22 '22 at 20:46
Convert the mistake strings into regular expressions, with the `x` replaced with a pattern that matches any vowel. Then search for a matching string in `name_G7_countries`. — Barmar, Feb 22 '22 at 20:48
The problem is that the lists might not be aligned, or that 'x' represents any vowel, so the mistakes can be not only 'Jxpxn', but also 'Jepin' and such? — Yuval.R, Feb 22 '22 at 21:05
Yes exactly replacing a certain character to fix the problem for one phrase couldn't solve all the other typos. — Jake, Feb 22 '22 at 21:08
Yes! However Jxpxn'.replace('x', 'a') may work, but 'Xtxly'.replace('x','a') is not going to work because this .replace function replaces all 'x' in the text with 'a', whereas the first letter should be 'I' for 'Italy' — Jake, Feb 22 '22 at 21:11

Yuval.R · Answer 1 · 2022-02-22T21:17:48.353

You can try doing that using regular expressions.
We want to run on the two lists, and then to replace each x with ^[aeiou], and X with ^[AEIOU] (which mean "match any of aeiou, or in the case of the capital X, AEIOU") for the regex, and after that we replace all words which match that pattern, with the original words.

import re

name_G7_countries = ['Canada', 'France', 'Germany', 'Italy', 'Japan', 'UK', 'USA']
mistake =  ['Cxnxdx', 'Frxncx', 'Gxrmxny', 'Xtxly', 'Jxpxn','XK', 'XSX']

def corrected_text(text):
    for original, m in zip(name_G7_countries, mistake):
        m = m.replace('x', '[aeiou]').replace('X', '[AEIOU]')
        text = re.sub(m, original, text)
    return text

So for example, we convert 'Xtxly' to '[AEIOU]t[aeiou]ly', so we match every word which starts with a capital vowel, a 't' after it, a lowercase vowel, and after that 'ly', and replace it with 'Italy'.

score 1 · Answer 2 · answered Feb 22 '22 at 20:51

Create a lookup table, where the keys are the names of the countries with the vowels removed, and the values are the actual names of the countries.

Then, you can look up countries in this table by removing the Xs in the corrupted data.

name_G7_countries = ['Canada', 'France', 'Germany', 'Italy', 'Japan', 'UK', 'USA']
mistake =  ['Cxnxdx', 'Frxncx', 'Gxrmxny', 'Xtxly', 'Jxpxn','XK', 'XSX']

def remove_letters(s, letters_to_remove):
    return ''.join(filter(lambda x: x not in letters_to_remove, s))

country_lookup_table = {remove_letters(s, "AEIOUaeiou"): s 
    for s in name_G7_countries}

result = [country_lookup_table[remove_letters(mistake_country, 'Xx')] 
    for mistake_country in mistake]

# Prints ['Canada', 'France', 'Germany', 'Italy', 'Japan', 'UK', 'USA']
print(result)

Note that we use ''.join() rather than repeated concatenation for the remove_letters() function for efficiency reasons.

This will not work properly, as it will take any letter in the place of 'x', instead of only vowels. — Yuval.R, Feb 22 '22 at 21:30
I'm not sure what you mean. The processing on the correct strings removes the vowels. Regardless, you're never going to be able to guarantee a perfect match, as each 'x' can map to multiple possible vowels. — BrokenBenchmark, Feb 22 '22 at 22:58

How can I correct vowels that were replaced by a special character in a string?

2 Answers2