Python - Replace list of characters with another list

Question

I have two lists:

wrong_chars = [
    ['أ','إ','ٱ','ٲ','ٳ','ٵ'],
    ['ٮ','ݕ','ݖ','ﭒ','ﭓ','ﭔ'],
    ['ڀ','ݐ','ݔ','ﭖ','ﭗ','ﭘ'],
    ['ٹ','ٺ','ٻ','ټ','ݓ','ﭞ'],
]

true_chars = [
    ['ا'],
    ['ب'],
    ['پ'],
    ['ت'],
]

For a given string I want to replace the entries in wrong_chars with those in true_chars. Is there a clean way to do that in python?

score 8 · Answer 1 · edited May 23 '17 at 12:14

8

string module to the rescue!

There's a really handy function as a part of the string module called translate that does exactly what you're looking for, though you'll have to pass in your translation mapping as a dictionary.

The documentation is here

An example based on a tutorial from tutoriapoint is shown below:

>>> from string import maketrans

>>> trantab = maketrans("aeiou", "12345")
>>> "this is string example....wow!!!".translate(trantab)

th3s 3s str3ng 2x1mpl2....w4w!!!

It looks like you're using unicode here though, which works slightly differently. You can look at this question to get a sense, but here's an example that should work for you more specifically:

translation_dict = {}
for i, char_list in enumerate(wrong_chars):
    for char in char_list:
        translation_dict[ord(char)] = true_chars[i]

example.translate(translation_dict)

edited May 23 '17 at 12:14

Community

1
1

answered Jul 01 '15 at 16:56

Slater Victoroff

21,376
21
85
144

thanks for good answer. but i have question again. I change your code to `translation_dict[ord(char.decode('utf-8'))] = true_chars[i]`. This is true? and i get error: `expected a character buffer object` in this line – Chalist Jul 01 '15 at 17:41
@chalist you shouldn't have to decode the character to get the ord. Have you tried on the raw unicode object? – Slater Victoroff Jul 01 '15 at 21:11
Note that `string` module does not contain `maketrans` function in python 3, rather it is available in python2. If anyone is interested in using `maketrans`, they need call this function on `str`: `str.maketrans(...)` – TheFaultInOurStars Feb 25 '22 at 17:11

score 2 · Answer 2 · answered Jul 02 '15 at 20:24

I merged your two wrong and true chars in a list of dictionaries of wrongs and what should be replaced with them. so here you are:
link to a working sample http://ideone.com/mz7E0R
and code itself

given_string = "ayznobcyn"
correction_list = [
                    {"wrongs":['x','y','z'],"true":'x'},
                    {"wrongs":['m','n','o'],"true":'m'},
                    {"wrongs":['q','r','s','t'],"true":'q'}
                  ]

processed_string = ""
true_char = ""

for s in given_string:
    for correction in correction_list:
        true_char=s
        if s in correction['wrongs']:
            true_char=correction['true']
            break
    processed_string+=true_char

print given_string
print processed_string

this code can be more optimized and of course i do not care about unicode problems if there was any, because i see you are using Farsi. you should take care about that.

jfs · Accepted Answer · 2015-07-01T21:48:35.093

1

#!/usr/bin/env python
from __future__ import unicode_literals

wrong_chars = [
    ['1', '2', '3'],
    ['4', '5', '6'],
    ['7'],
]
true_chars = 'abc'

table = {}
for keys, value in zip(wrong_chars, true_chars):
    table.update(dict.fromkeys(map(ord, keys), value))
print("123456789".translate(table))

Output

aaabbbc89

edited Jul 01 '15 at 21:48

answered Jul 01 '15 at 21:43

jfs

399,953
195
994
1,670

@chalist: the code works as is on Python 2 and 3. Do you have `from __future__ import unicode_literals` at the top in your code? – jfs Jul 02 '15 at 10:39
1

@chalist: here's [live example](http://ideone.com/7J9CWB) that demonstrates that it works. Update your quesiton, to include the *complete* (but minimal) code example with the full traceback if any. – jfs Jul 02 '15 at 11:18
@chalist: a single user-perceived character may span *several* Unicode codepoints. (I've used `'abc'` as a shortcut for `['a', 'b', 'c']`). Use a list, to see the character boundaries: http://ideone.com/cweBU9 If a "wrong character" contains more than one Unicode codepoint then you could use `text.replace(multiple_codepoints, true_char)` or `re.sub("|".join(map(re.escape, ['1', '2', '3'])), 'a', text)` – jfs Jul 02 '15 at 13:58

score 0 · Answer 4 · answered Jul 02 '15 at 10:11

0

In my idea you can make just one list that contain true characters too like this:

NewChars = {["ا"،"أ"،"إ"،"آ"], ["ب"،"بِ"،"بِ"،]} 
# add all true characters to the first of lists and add all lists to a dict, then:
Ch="إ"
For L in NewChars:
    If Ch in L: return L[0]

answered Jul 02 '15 at 10:11

Hosein Remezan

438
9
19

thanks but list is very very big. each of rows has over 100 char somtimes. – Chalist Jul 02 '15 at 10:34

Python - Replace list of characters with another list

4 Answers4

Output