I am trying to find chinesse words in two differnet files, but It didn't work so I tried to search for the words in the same file I get them from, but it seems it doesn't find it neither? how is it possible?
chin_split = codecs.open("CHIN_split.txt","r+",encoding="utf-8")
used this for the regex code.
import re
for n in re.findall(ur'[\u4e00-\u9fff]+',chin_split.read()):
print n in re.findall(ur'[\u4e00-\u9fff]+',chin_split.read())
how comes I get only falses
printed???
FYI I tried to do this and it works:
for x in [1,2,3,4,5,6,6]:
print x in [1,2,3,4,5,6,6]
BTW
chin_split
contains words in English Hebrew and Chinese
some lines from chin_split.txt
:
he daodan 核导弹 טיל גרעיני
hedantou 核弹头 ראש חץ גרעיני
helu 阖庐 "ביתו, מעונו
helu 阖庐 שם מלך וו בתקופת ה'אביב והסתיו'"
huiwu 会晤 להיפגש עם