I have a collections of text files, and I want to replace certain characters within them with another character using Python. The characters I wish to replace are from the Welsh language, and they are digraphs. They are two separate characters, which form a single letter.
Here are some Welsh digraphs and some characters they could be replaced by:
ch - ƒ (ASCII code 131)
dd - Œ (ASCII code 140)
ff - ¤ (ASCII code 164)
The text files I will be working with may be fairly large (a few GB's) and there are 8 digraphs; in total there will be 24 replacement characters needed to cover all forms (ch, Ch, CH). I was wondering what would be an effective and efficient way to implement these replacements?
UPDATE:
I have a working (so far) version of a program which was based on an answer from this question:
replacing text in a file with Python
Here is my code:
replacements = {'ch':'ƒ', 'Ch':'†', 'ff':'¤', 'FF':'¦', 'Dd':'•', 'll':'º', 'Ll':'¿'}
print("Input file location: ")
inLoc = input("> ")
print("Output file location: ")
outLoc = input("> ")
with open(inLoc, "r") as infile, open(outLoc, "w") as outfile:
for line in infile:
for src, target in replacements.items():
line = line.replace(src, target)
outfile.write(line)
Input text:
Ydych Chi'n hoffi COFFI?
Dda de. Lle why ti llywelyn?
Output text:
Ydyƒ †i'n ho¤i CO¦I?
•a de. ¿e why ti ºywelyn?