I am trying to perform some replacements in a file:
'\t' --> '◊'
'⁞' --> '\t'
This question recommends the following procedure:
import fileinput
with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
for line in file:
line = line.replace('\t','◊')
print(line.replace('⁞','\t'), end='')
I am not allowed to comment there, but when I run this piece of code I get an error saying:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 10: character maps to <undefined>
This kind of error I have remedied previously by adding encoding='utf-8'
.
The problem is that fileinput.FileInput()
does not allow for an encoding argument.
Question: How to get rid of this error?
The above solution, if it would work and provided that the speed is comparable to the following method, would please me most. It seems to be doing inplace replacements as it should be done.
I have tried also:
replacements = {'\t':'◊', '⁞':'\t'}
with open(filename, encoding='utf-8') as inFile:
contents = inFile.read()
with open(filename, mode='w', encoding='utf-8') as outFile:
for i in replacements.keys():
contents = contents.replace(i, replacements[i])
outFile.write(contents)
which is relatively fast, but very greedy when it comes to memory.
For UNIX users, I need something which does the following thing:
sed -i 's/\t/◊/g' 'file.csv'
sed -i 's/⁞/\t/g' 'file.csv'
This turns out to be rather slow.