Decode a text using a dictionary

Question

I'm trying to decode a text by using this dictionary which will serve as alphabet:

mydict = {'ぁ': 'あ', 'ァ': 'あ', '了': 'あ', 'ぃ': 'い', 'ィ': 'い', 'ﾚヽ': 'い', 'ﾚ丶': 'い', 'レ）': 'い', 'ﾚ`': 'い', 'L丶': 'い', 'L1': 'い', 'ﾚl': 'い', 'ぅ': 'う', 'ゥ': 'う', '宀': 'う', 'ヴ': 'ふ', 'ウ': 'う', 
'ぇ': 'え', 'ェ': 'え', '之': 'え', '工': 'I', 'ヱ': 'え', 'ぉ': 'を', 'ォ': 'お', '才': 'お', 'ｶゝ': 'か', 'ｶ丶': 'か', 'ｶヽ': 'か', 'ヵゝ': 'か', 'ｶ`': 'か', 'ｶゞ': 'か', 'カ': 'か', 'ヵゞ': 'が', 'カ゛': 
'ガ', '力゛': 'ガ', '(ｷ': 'き', '(≠': 'き', 'L≠': 'き', 'ｷ':'き','‡': 'き',"(ｷ\"\"": "ぎ", "(≠\"": "ぎ", "L≠\"": "ぎ", "ｷ\"":"ぎ","‡\"": "ぎ", '＜': 'く', '〈': 'く', '勹': 'く', 'ヶ': 'け', '(ﾅ': 'け', 'ﾚ†': 'け', 'ﾚﾅ': 'け', '|ナ': 'け', 'l+': 'け', 'Iﾅ': 'け',"＜\"": "ぐ", "〈\"": "ぐ", "勹\"": "ぐ", "ヶ\"": "げ", "(ﾅ\"": "げ", "ﾚ†\"": "げ", "ﾚﾅ\"": "げ", "|ナ\"": "げ", "l+\"": "げ", "Iﾅ\"": "げ", '〓': 'こ', '=': 'こ', ']': 'こ', '⊃': 'こ', '⊇': 'こ',"〓\"": "ご", "=\"": "ご", "]\"": "ご", "⊃\"": "ご", "⊇\"": "ご", '廾': 'さ', '±': 'さ', '(十': 'さ', 'L+': 'さ', '(+': 'さ', '､ﾅ': 'さ', 'ι': 'し', '∪': 'U', 'U': 'し', '￡': 'す', 'ﾌ､': 'す', '世': 'せ', 'ξ': 'そ', 'ζ': 'ら', '`ﾉ': 'そ', '丶/': 'そ', 'ヽ丿': 'そ', 'ﾅ=': 'た', '+=': 'た', '†ﾆ': 'た', 'ﾅﾆ': 'た', '十こ': 'た', '†こ': 'た', 'ﾅ⊇': 'た', 'T=': 'た', '十=': 'た', '夕': 'た', '干': 'ち', '千': 'ち', '于': 'ち', 'ろ＋': 'ち', 'っ': 'つ', 'ッ': 'つ', 'ﾂ【っ】': 'つ', '廾"': 'ざ', '±"': 'ざ', '(十"': 'ざ', 'L+"': 'ざ', '(+"': 'ざ', '､ﾅ"': 'ざ', 'ι"': 'じ', '∪"': 'じ', 'U"': 'じ', '￡"': 'ず', 'ﾌ､"': 'ず', '世"': 'ぜ', 'ξ"': 'ぞ', 'ζ"': 'ぞ', '`ﾉ"': 'ぞ', '丶/"': 'ぞ', 'ヽ丿"': 'ぞ', 'ﾅ="': 'だ', '+="': 'だ', '†ﾆ"': 'だ', 'ﾅﾆ"': 'だ', '十こ"': 'だ', '†こ"': 'だ', 'ﾅ⊇"': 'だ', 'T="': 'だ', '十="': 'だ', '夕"': 'だ', 'っ"': 'づ', 'ッ"': 'づ', 'ﾂ【っ】"': 'づ', '⊃"':'づ','⊃':'つ','干"': 'ぢ', '千"': 'ぢ', '于"': 'ぢ', 'ろ＋"': 'ぢ','τ': 'て', '〒': 'て', 'z': 'て', '乙': 'て', '┠': 'と', '┝': 'と', '┣': 'と', '├': 'と', '`⊂': 'と', '卜': 'と', '`c': 'と',"τ\"": 'で', 'z"': 'で', '乙"': 'で', '┠"': 'ど', '┝"': 'ど', '┣"': 'ど', '├"': 'ど', '`⊂"': 'ど', '卜"': 'ど', '`c"': 'ど', 'ﾅょ': 'な', '十ょ': 'な', '†ょ': 'な', 'ﾅg': 'な', '†ょゝ': 'な', '十': 'な', '(ﾆ': 'に', '|=': 'に', '丨ﾆ': 'に', 'L=': 'に', 'I=': 'に', '（⊇': 'に', 
'レこ': 'に', '(二': 'に', 'ﾚﾆ': 'に', 'йu': 'ぬ', 'ゐ': 'み', 'йё': 'ね', '/': 'の', '丿': 'の', 'σ': 'の', '⊂n': 'の', '＠': 'の', "'`": 'は', '八': 'は', 'l￡': 'は', '(￡': 'は', 'ﾉ|': 'は', 'ﾉl': 'は', 'ﾚ￡': 'は', 'ﾚよ': 'は', 'ﾊ〃': 'バ', 'ﾊo': 'パ', '匕': 'ひ', 'ﾋ〃': 'ビ', 'ﾋo': 'ピ', ',ζ,':  
'ふ', 'ﾌ〃': 'ブ', 'ﾌo': 'プ', '〜': 'へ', '∧': 'へ', 'ﾍ〃': 'べ', 'ﾍo': 'ペ', '朮': 'ほ', 'ﾚま': 'ほ', 'ﾎ〃': 'ボ', 'ﾎo': 'ポ', 'ма': 'ま', 'мα': 'ま', '彡': 'み', '￡′': 'む', '厶': 'む', '×': 'め', 'x': 'め', 'χ': 'X', '乂': 'X', '〆': 'め', 'м○': 'も', 'мσ': 'も', '=し': 'も', '=L': 'も', 'ゃ': 
'や', 'ャ': 'や', 'ゅ': 'ゆ', 'ュ': 'ゆ', 'ょ': 'よ', 'ョ': 'よ', '∋': 'よ', 'чo': 'よ', '∃': 'よ', 'яа': 'ら', 'b`': 'ら', 'L|': 'り', 'l)': 'り', 'ﾚ｣': 'り', 'ﾚ)': 'り', '┗』': 'り', '└丿': 'り', 'v)': 'り', '丶)': 'り', 'ゐ': 'る', 'ゑ': 'る', '儿': 'る', 'lﾚ': 'る', '｣レ': 'る', '｜ﾚ': 'る', 'ﾉﾚ': 'る', '/ﾚ': 'る', 'яё': 'れ', 'з': 'ろ', 'З': 'ろ', '回': '口', 'ゎ': 'わ', 'ヮ': 'わ', 'wα': 'わ', 'щo': 'を', 'ε': 'を', 'ω': 'ん', '冫': 'ん', 'w': 'ん', 'h': 'ん', 'ﾝ': 'ん', 'ｿ': 'ん', '〃': '濁点', '”': '濁点', '¨': '濁点', '→': 'ー', '⇒': 'ー', 'o': '。', '○': '。', 'まぢ': 'まじ', 'ッ塚': 'てゆうか', 'ッ乙ゆｳねぇ': 'て言っていて', '王見': '現', '禾斗': '科', '禾ﾑ': '私', 'ﾛ十': '叶', 'ﾃ殳': '役', 'ｲ歹ﾘ': '例', '辶斤': '近', 'ｲ廴聿': '健', 'ﾅ月': '有', 'ﾀ。': '名', '⊃i⊂': '氷', 'ﾗ|＜': '氷', '无': '天', '夲': '本', '來': '来', 'ﾈ申': '神', '走召': '超', '木木': '林', '木交': '校', '糸色': '絶', '糸及': '級', '金失': '鉄', '金矢': '鉄', '木市': '柿', 'ｲ子': '仔', '愛ιτゑ': '愛してる', '愛Uτゑ': '愛してる', '人㊥ょU': '仲良し', 'イ㊥夜歹ﾋ': '仲良し', '言身寸': '謝', '糸冬': '終', '文寸': '対', 'イ更戸斤': '便所', '木圭哥欠丈乙': '桂歌丸', '因': '大', '囚': '人', '困': '木', '圏': '巻', '圉': '幸', '囦': '水', '囝': '子', '囡': '女', '団': '寸', '囲': '井', '囮': '化', '园': '元', '図': '爪', '囨': '不', '固': '古', '囶': '八土', '国': '玉', '囹': '令', '4ほぅ圀': '四方八方', '囿': '有', '圕': '書', '圓': '員', '仲仔': '仲良し', 'Θ': '日', '曰': '日', 'Φ': '中', '㊥': '中', '升': 'チート', 'ﾃﾃ': '行', '〒〒': '行', '糸合米斗': '給料', '京尤シ舌': '就活', 'ｲ士': '仕', 'ｲ木': '休', 'ﾈ兄': '祝', '言売': '読', '谷欠': '欲', '原頁': '願', '月月': '朋', '言吾': '語', '言忍': '認', '糸売': '続', 'ｲ吏': '使', 'ﾀﾋ': '死', '㊤': '上', '㊦': '下', '㊧': '左', 'ﾅｪ': '左', '㊨': '右', '㊞': '印', 'E卩': '印', '闩': 'A', '月': 'A', 'Д': 'A', 'д': 'A', '＠': 'A', 'Å': 'A', '∀': 'A', '吕': 'B', '官': 'B', '♭': 'B', 'в': 'B', 'ь': 'B', '匚': 'C', '匸': 'C', '￠': 'C', '⊂': 'C', '℃': 'C', '囙': 'D', 'ヨ': 'E', '巨': 'E', '臣': 'E', '巳': 'E', 'ㅌ': 'E', 'ё': 'E', '孒': 'F', '下': 'F', '⊂┐': 'G', 'C┐': 'G', '丩': 'H', 'н': 'H', '｜': 'L', 'し': 'J', '」': 'J', '』': 'J', '|く': 'K', 'κ': 'K', '└': 'L', '从': 
'M', 'ﾍﾍ': 'M', '川': 'M', '瓜': 'M', 'м': 'M', '冂': 'N', '∩': 'N', 'и': 'N', 'И': 'N', '口': 'O', '○': 'O', 'ο': 'O', '尸': 'P', 'ρ': 'P', '电': 'Q', 'O、': 'Q', '尺': 'R', 'γ': 'R', 'я': 'R', 'Я': 'R', '丂': 'S', '＄': 'S', '∫': 'S', '丁': 'T', 'て': 'T', '十': 'T', 'т': 'T', '凵': 'U', 'ц': 'U', 'レ': 'V', '∨': 'V', '山': 'W', 'ш': 'W', 'щ': 'W', 'Ш': 'W', 'Щ': 'W', '×': 'X', 'ソ': 'Y', '￥': 'Y', 'ч': 'Y', 'Ч': 'Y', '乙': 'Z'}

and i have this string:

mystr ="ｷょぅゎ、1囚τ\"〒〒ｶヽ世τ㊦±ぃ。"

to translate this text into correct japanese i use this code:

new_d = {}
for k in sorted(mydict, key=len, reverse=True):
    new_d[k] = mydict[k]
for el,it in new_d.items(): 
    if el in mystr:
        mystr = mystr.replace(el.strip(),new_d.get(el).strip())
    print(mystr.strip())

I get this string printed:

きようわ、1人T行かせTFさい。

whereas i should normally get

きょうは、1人で行かせて下さい。

Can somebody help me please, because i don't understand why it's not working.

For those of us who don't know these languages, can you narrow down the problem in the translation? — Barmar, Feb 08 '22 at 17:53
A couple of hints to help simplify your code and possibly correct/narrow down your issue. You don't need to sort anything. Iterate over your string and not your dictionary. Each character you are iterating over, just do a simple dictionary lookup: `val = your_dict.get(char)`. If it is not in the dictionary, the default behaviour of `get` is to return `None`. As you find characters, you can simply build your string through each iteration: `new_str += val`. — idjaw, Feb 08 '22 at 17:57
How are you determining how to group certain characters? I see some keys in your dictionary are multiple characters. — ddejohn, Feb 08 '22 at 17:57
One problem you'll have is if the result of some translations are also keys of other translations. This will cause double translations. — Barmar, Feb 08 '22 at 18:04
To Barmar's point, there would need to be a better way to permutate over that string to get combinations. But, it seems like the string is not intended to be handled by taking different permutations of the string? There would have to be some knowledge of what combinations of characters should be grouped together. I would say an easy way to solve this would be to use a list to construct the parts of each string. This way you have your grouping of characters already. But, this is all just guessing..... — idjaw, Feb 08 '22 at 18:08

score 1 · Accepted Answer · answered Feb 08 '22 at 19:27

Your dictionary does not map all characters to the ones you are expecting.

For example:

"ゎ" maps to "わ"
"τ"  maps to "て"

You process replaces the result of previous mappings causing some of the mapped characters to be mapped again at the next iteration.

To avoid double translating you can convert your string like this:

mystr = "".join(mydict.get(c,c) for c in mystr)

Which will result in:

きようわ、1人て"ててｶヽせて下さい。

You will need to fix the letter mappings in the dictionary to obtain your desired result:

きょうは、1人で行かせて下さい。

Decode a text using a dictionary

1 Answers1