This part of code works fine and builds the dictionary fine.
#!/usr/bin/env python
#-*- coding: utf-8 -*-
import collections
from operator import itemgetter
S_eng = "Hindi constitutional form in India first"
S_hindi = "हिन्दी संवैधानिक रूप से भारत की प्रथम "
word_count = collections.defaultdict( dict )
for st in S_eng .split(" "):
for st_1 in S_hindi.split(" "):
print type(st), type(st_1)
word_count[st][st_1] = 1
print word_count
But when I try to read a file having English and Hindi sentences and trying to create a dictionary the below happens
#!/usr/bin/env python
#-*- coding: utf-8 -*-
P = defaultdict(dict)
i = "your"
j = "अपने"
if(P[i][j] >= 0):
P[i][j] += 1
else:
P[i][j] = 0
print P
This gives error as:
Traceback (most recent call last):
File "lerxical_probab.py", line 31, in <module>
if(P[i][j] >= 0):
KeyError: '\xe0\xa4\x85\xe0\xa4\xaa\xe0\xa4\xa8\xe0\xa5\x87'
I checked the types of i and j
too both are 'str' only.
Can someone please help in this matter?
And how come one works and other don't?