I have large amounts of list for replacement like below.
The remplacement file list.txt
:
人の,NN
人の名前,FF
And the data in which to replace text.txt
:
aaa人の abc 人の名前def ghi
I want to replace this text to like below using list.txt
.
>>> my_func('aaa人の abc 人の名前def ghi')
'aaaNN abc FFdef ghi'
This is my code. But I think this is quite inefficiency to process large data.
d = {}
with open('list.txt', 'r', encoding='utf8') as f:
for line in f:
line = line.strip()
d[line.split(',')[0]] = line.split(',')[1]
with open('text.txt', 'r', encoding='utf8') as f:
txt = f.read()
st = 0
lst = []
# \u4e00-\u9fea\u3040-\u309f] means the range of unicode of Japanese character
for match in re.finditer(r"([\u4e00-\u9fea\u3040-\u309f]+)", txt):
st_m, ed_m = match.span()
lst.append(txt[st:st_m])
search = txt[st_m:ed_m]
rpld = d[search]
lst.append(rpld)
st = ed_m
lst.append(txt[st:])
print(''.join(lst))
Please let me know better way.