Replacing multiple accented letters in string with one letter using list comprehension

Question

I have a function which takes a string and has parameters for ignoring case and ignoring accents. It all seems to work when using a for loop for the ignore_accents parameter. When trying to use a list comprehension though, it no longer returns the expected value.

Is this just a syntax error? I am not able to implement the list comprehension. I've been looking at Best way to replace multiple characters in a string? and a few other posts.

def count_letter_e_text(file_text, ignore_accents, ignore_case):

    e = "e"
    acc_low_e = ["é", "ê", "è"]

    if ignore_case is True:
        file_text = file_text.lower()

    if ignore_accents is True:

        # this works
        #file_text = file_text.replace("é", e).replace("ê", e).replace("è", e)

        # this works too
#         for ch in acc_low_e:
#             if ch in file_text:
#                 file_text = file_text.replace(ch, e)

        # does not work as list comprehension
        #file_text = [ch.replace(ch, e) for ch in file_text if ch in acc_low_e] # gives count of 6
        file_text = [file_text.replace(ch, e) for ch in acc_low_e if ch in file_text] # gives count of 0

    num_of_e = file_text.count(e) 

    return num_of_e

Driver program:

text = "Sentence 1 test has e, é, ê, è, E, É, Ê, È"
# expecting count of 12; using list comprehension it is 0
text_e_count = count_letter_e_text(text, True, True)
text_e_count

In your two working scenarios, `file_text` ends up as a string with all of the replacements applied to it. But in the listcomp scenario, it ends up as a *list* of strings, each with only one replacement applied. `.count()` on a list is checking whether any of the individual strings is equal to "e", which is very unlikely; it's not looking into each string to count their "e"s. — jasonharper, Mar 16 '20 at 14:41
If this is more than an exercise I suggest you install `unidecode`. It'll help you remove all type of accents in all your text in just a line — Juan C, Mar 16 '20 at 14:45

Serge Ballesta · Accepted Answer · 2020-03-16T14:55:17.410

3

A list comprehension produces a list. Here you could build a list of characters and join it:

file_text = ''.join([t if t not in acc_low_e else 'e' for t in file_text])

edited Mar 16 '20 at 14:55

answered Mar 16 '20 at 14:40

Serge Ballesta

143,923
11
122
252

This looks like what I'm after; however, I'm receiving an "invalid syntax" error (error points at the `r` in `for`). – md2614 Mar 16 '20 at 14:50
`.join` is a method on the string, so parentheses are necessary - I fixed the code accordingly. (It's actually possible to drop the square brackets now, but that requires more explanation.) – Karl Knechtel Mar 16 '20 at 14:55
@md2614: There was a typo in my line: I wrote `acc_low` when your code used `acc_low_e`. Fixed. – Serge Ballesta Mar 16 '20 at 14:56
I caught the typo but needed the parentheses. Thank you both. – md2614 Mar 16 '20 at 14:58

score 0 · Answer 2 · answered Mar 16 '20 at 14:37

0

I would use a regex replacement here:

acc_low_e = ["é", "ê", "è"]
regex = '|'.join(acc_low_e)
file_text = 'café mocha'
print(file_text)
file_text = re.sub(regex, 'e', file_text)
print(file_text)

answered Mar 16 '20 at 14:37

Tim Biegeleisen

502,043
27
286
360

Replacing multiple accented letters in string with one letter using list comprehension

2 Answers2