I have created a function that is supposed to read a text file and replace a lot of ASCII characters with equivalents in Unicode. The problem is that the function does not replace any characters in the string, only if I remove all of the items in the dictionary except one. I have experimented the whole day but cannot seem to find the solution to the problem.
Here is the function:
import re
match = {
# the original dictionary contain over 100 items
"᾿Ι" : "Ἰ",
"᾿Α" : "Ἀ",
"´Α" : "Ά",
"`Α" : "Ὰ",
"᾿Α" : "Ἀ",
"᾿Ρ" : "ῤ",
"῾Ρ" : "Ῥ"
}
with open("file.txt", "r", encoding="utf-8") as file, open("OUT.txt", "w", encoding="utf-8") as newfile:
def replace_all(text, dict):
for i, j in dict.items():
result, count = re.subn(r"%s" % i, j, str(text))
return result, count
# start the function
string = file.read()
result, count = replace_all(string, match)
# write out the result
newfile.write(result)
print("Changes: " + str(count))
The text file contains a lot of rows similar to the one below:
Βίβλος γενέσεως ᾿Ιησοῦ Χριστοῦ, υἱοῦ Δαυῒδ, υἱοῦ ᾿Αβραάμ.
Here the characters "᾿Ι" and "᾿Α" are supposed to be replaced with "Ἰ" and "Ἀ".