1

While writi string to a file I am getting an error :

'charmap' codec can't encode character '\ufb01' in position 108: character maps to <undefined>

This is what I have tried :

import re
file = open(filepath, "w")
temp_con = content
content = re.sub(r'\W+ \.', '', temp_con)
print(content)
file.write(content)

The string when printed is :

By noon they will all be at my new
house in the Victor's Village. The
reporters, the camera crews, even Effie
Trinket, my old escort, will have made
their way to District 12 from the Capitol.
I wonder if Effie will still be wearing that
silly pink wig, or if she'll be sporting
some other unnatural colour especially
for the Victory Tour. There will be others
waiting, too. A staff to cater to my every
need on the long train trip. A prep team
to beautify me for public appearances.
My stylist and friend, Cinna, who
designed the gorgeous outfits that first
made the audience take notice of me in
the Hunger Games.

If it were up to me, I would try to
forget the Hunger Games entirely. Never
speak of them. Pretend they were

How do i resolve this?

Note : I tried the suggestion to this question but that turned out to be the solution for python 2.

  • The problem is the same as in the duplicate I linked, but the other way around: since you are writing rather than reading, you need to encode from the string to bytes, rather than decoding from bytes to string. – Karl Knechtel May 21 '20 at 18:00
  • Its strange it references a wide character byte code sequence. If thats the default, wide character is already recognized as the BMP encoding even without surrogates. Create and read in a file with these characters in it 龜𢡊𢡄𣏕㮝䀘䀹𥉉, \uFACE - \uFAD5. Even if you open it with utf-8 if the characters are in the bmp a default encoding of 2 bytes should have sucessfully decoded it. And that would be a bug if not. –  May 21 '20 at 18:30

1 Answers1

2

When you open a file, you need to provide an encoding parameter that can handle all the characters you need to read or write. In this case it's complaining about the ligature which isn't part of many character sets. If you specify UTF-8 it should be able to handle it.

file = open(filepath, "w", encoding='utf-8')
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622