KeyError when using 'devnagri'(Hindi) script for a key in creating a dictionary

Question

This part of code works fine and builds the dictionary fine.

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import collections
from operator import itemgetter

S_eng = "Hindi constitutional form in India first"
S_hindi = "हिन्दी संवैधानिक रूप से भारत की प्रथम "
word_count = collections.defaultdict( dict )

for st in S_eng .split(" "):
    for st_1 in S_hindi.split(" "):
        print type(st), type(st_1)
        word_count[st][st_1] = 1

print word_count

But when I try to read a file having English and Hindi sentences and trying to create a dictionary the below happens

#!/usr/bin/env python
#-*- coding: utf-8 -*-

P = defaultdict(dict)
i = "your"
j = "अपने" 

if(P[i][j] >= 0):
    P[i][j] += 1

else:
    P[i][j] = 0

print P

This gives error as:

Traceback (most recent call last):
  File "lerxical_probab.py", line 31, in <module>
    if(P[i][j] >= 0):
KeyError: '\xe0\xa4\x85\xe0\xa4\xaa\xe0\xa4\xa8\xe0\xa5\x87'

I checked the types of i and j too both are 'str' only. Can someone please help in this matter?

And how come one works and other don't?

You really should be using Python 3, which has much nicer Unicode handling. In the mean time, you should use u-strings for Unicode, eg `j = u"अपने" ` — PM 2Ring, May 23 '18 at 08:18
@PM2Ring Ohh yes I see Python3 supports much nicer Unicode handling thanks for pointing this out, but when I use it, I still am stuck at the same error. It's just that the error now is readable. Any heads up on this. " KeyError: 'अपने' " — SilentFlame, May 23 '18 at 08:28
Sure. You need `P = defaultdict(lambda: defaultdict(int))`. See https://stackoverflow.com/a/2600813/4014959 Also check the other answers there for useful ideas. — PM 2Ring, May 23 '18 at 08:29
At the same time would like to know how did it worked for the 1st case and threw an error in 2nd? — SilentFlame, May 23 '18 at 08:38
The 1st code block works because you made a `defaultdict` of `dict`. So when you do `word_count[st][st_1] = 1` if `word_count[st]` doesn't exist then a new `dict` is created as if you did `newdict = {}; word_count[st] = newdict; newdict[st_1] = 1 `. So the operation on `newdict` is a simple adding of a key-value pair to `newdict`. But in your 2nd codeblock you are using `+=`, so it tries to add 1 to a non-existent number in a non-existent dictionary. To handle that automatically we need 2 levels of `defaultdict`, which that `lambda` gives us. — PM 2Ring, May 23 '18 at 08:47

KeyError when using 'devnagri'(Hindi) script for a key in creating a dictionary

0 Answers0