I have a file containing a list of Unicode characters which (due to a copy paste fail) also has the hex code every 16 characters e.g.
Ս Վ Տ 0550 Ր Ց Ւ Փ Ք Օ Ֆ ՙ ՚ ՛ ՜ ՝ ՞ ՟ 0560 ՠ ա բ գ
with the 0550
and 0560
in the middle. I want to make a program that will remove these numbers, but when I try to read the file, it raises an error:
Traceback (most recent call last):
File "C:\Users\Millicent\Desktop\a.py", line 1, in <module>
open('characters.txt').read()
File "C:\Python34\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 392: character maps to <undefined>
My current code is
with open('character.txt','r') as file:
chars = file.read().split()
def isdigit(string):
try:
int(string, 16)
return True
except:
return False
chars = list(filter(lambda s: len(s) != 4 and isdigit(s), chars))
with open('characters.txt','w') as file:
file.write(''.join(chars))
Can someone tell me how to make Python accept the special characters?