2

This part of code works fine and builds the dictionary fine.

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import collections
from operator import itemgetter

S_eng = "Hindi constitutional form in India first"
S_hindi = "हिन्दी संवैधानिक रूप से भारत की प्रथम "
word_count = collections.defaultdict( dict )

for st in S_eng .split(" "):
    for st_1 in S_hindi.split(" "):
        print type(st), type(st_1)
        word_count[st][st_1] = 1

print word_count

But when I try to read a file having English and Hindi sentences and trying to create a dictionary the below happens

#!/usr/bin/env python
#-*- coding: utf-8 -*-

P = defaultdict(dict)
i = "your"
j = "अपने" 

if(P[i][j] >= 0):
    P[i][j] += 1

else:
    P[i][j] = 0

print P

This gives error as:

Traceback (most recent call last):
  File "lerxical_probab.py", line 31, in <module>
    if(P[i][j] >= 0):
KeyError: '\xe0\xa4\x85\xe0\xa4\xaa\xe0\xa4\xa8\xe0\xa5\x87'

I checked the types of i and j too both are 'str' only. Can someone please help in this matter?

And how come one works and other don't?

SilentFlame
  • 487
  • 5
  • 15
  • You really should be using Python 3, which has much nicer Unicode handling. In the mean time, you should use u-strings for Unicode, eg `j = u"अपने" ` – PM 2Ring May 23 '18 at 08:18
  • @PM2Ring Ohh yes I see Python3 supports much nicer Unicode handling thanks for pointing this out, but when I use it, I still am stuck at the same error. It's just that the error now is readable. Any heads up on this. " KeyError: 'अपने' " – SilentFlame May 23 '18 at 08:28
  • Sure. You need `P = defaultdict(lambda: defaultdict(int))`. See https://stackoverflow.com/a/2600813/4014959 Also check the other answers there for useful ideas. – PM 2Ring May 23 '18 at 08:29
  • @PM2Ring Oh yes this worked. Thanks a lot for the help. :) – SilentFlame May 23 '18 at 08:33
  • At the same time would like to know how did it worked for the 1st case and threw an error in 2nd? – SilentFlame May 23 '18 at 08:38
  • The 1st code block works because you made a `defaultdict` of `dict`. So when you do `word_count[st][st_1] = 1` if `word_count[st]` doesn't exist then a new `dict` is created as if you did `newdict = {}; word_count[st] = newdict; newdict[st_1] = 1 `. So the operation on `newdict` is a simple adding of a key-value pair to `newdict`. But in your 2nd codeblock you are using `+=`, so it tries to add 1 to a non-existent number in a non-existent dictionary. To handle that automatically we need 2 levels of `defaultdict`, which that `lambda` gives us. – PM 2Ring May 23 '18 at 08:47

0 Answers0