I have two python dictionaries containing information about japanese words and characters:
- vocabDic : contains vocabulary, key: word, value: dictionary with information about it
kanjiDic : contains kanji ( single japanese character ), key: kanji, value: dictionary with information about it
Now I would like to iterate through each character of each word in the vocabDic and look up this character in the kanji dictionary. My goal is to create a csv file which I can then import into a database as join table for vocabulary and kanji.
My Python version is 2.6
My code is as following:kanjiVocabJoinWriter = csv.writer(open('kanjiVocabJoin.csv', 'wb'), delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL) kanjiVocabJoinCount = 1 #loop through dictionary for key, val in vocabDic.iteritems(): if val['lang'] is 'jpn': # only check japanese words vocab = val['text'] print vocab # loop through vocab string for v in vocab: test = kanjiDic.get(v) print v print test if test is not None: print str(kanjiVocabJoinCount)+','+str(test['id'])+','+str(val['id']) kanjiVocabJoinWriter([str(kanjiVocabJoinCount),str(test['id']),str(val['id'])]) kanjiVocabJoinCount = kanjiVocabJoinCount+1
If I print the variables to the command line, I get:
vocab : works, prints in japanese
v ( one character of the vocab in the for loop ) : �
test ( character looked up in the kanjiDic ) : None
To me it seems like the for loop messes the encoding up.
I tried various functions ( decode, encode.. ) but no luck so far.
Any ideas on how I could get this working?
Help would be very much appreciated.