1

My program is such that it takes in a string, and then tries to write that string to a file. I think that the issue is that the string has special characters (ü, ç, etc).

When I try to just write the string to a file, I get the compiler error (or something similar):

UnicodeEncodeError: 'charmap' codec cannot encode character '\u200b' in position 16: character maps to <undefined>

So then I wrote a function that looks like this:

def try_encode(info):
    if info is None:
        return None
temp = (str(info.encode('utf-8'))).replace("\n","")
return '"' + temp[2:len(temp)-1] + '"'

(I want to get rid of all newlines and write the string surrounded in quotes)

But the issue when I run this is that after the program runs, the file that I open has some strange characters in it that all start with: \x

Some examples of these characters are:

\xc3, \xa9p, \xaa, \xe2, \x80, etc

I think that these correspond to the special characters that I mentioned above. I have experimented with different encodings (utf-16, and utf-7), but they all either don't help or make these characters more common. Could anyone help me figure out how to get rid of these?

EDIT: including the code where I open the file:

f = open(filename, "w")
user1519226
  • 77
  • 2
  • 12
  • Are you using python 2 or 3? Python 3 makes these things somewhat easier. – Arndt Jonasson Apr 24 '18 at 20:50
  • Python 3, sorry for not clarifying. – user1519226 Apr 24 '18 at 20:55
  • 1
    Others have asked about this message, for example here: https://stackoverflow.com/questions/44391671/python3-unicodeencodeerror-charmap-codec-cant-encode-characters-in-position , maybe that helps? – Arndt Jonasson Apr 24 '18 at 21:01
  • 1
    The issue is probably in how you open the file, please include that code. – Josh Lee Apr 24 '18 at 21:46
  • @JoshLee it has been added – user1519226 Apr 24 '18 at 22:10
  • 1
    Possible duplicate of [python3 UnicodeEncodeError: 'charmap' codec can't encode characters in position 95-98: character maps to ](https://stackoverflow.com/questions/44391671/python3-unicodeencodeerror-charmap-codec-cant-encode-characters-in-position) – lenz Apr 25 '18 at 09:05

1 Answers1

0

You are doing the thing in wrong order.

There are strings, and there are binary representation of that strings (also known as encoding. One should work with strings, and just on time of writing (and reading) one should convert binary (encoded) text to Unicode text (abstract, ignore the internal representation of strings in Python language).

Your str(info.encode('utf-8')) makes not much sense: you are telling python to encode info to UTF-8, and then to decode it again with UTF-8 (it is the default encoding of str).

The replace, and the addition of quotes should apply just to strings. So your functions never do what the name imply "try_encode`": it encodes nothing.

So this problem is just not about this (broken) function, but it is on how you save/print the python strings. On Unix/Linux/MacOs you have by default UTF-8, but I expect you are in Windows, and there, there is no real default (it depends on local configuration). So you should explicitly specify with encoding you use (e.g. with open (adding parameter, e.g. encoding='utf-8') [In windows it is practically mandatory, but better to be explicit and not rely to implicit convention also on other operating systems)]

For printing things are more complex, because one should not choose the encoding, but the terminal/console should give the program what encoding the console supports (and could print). So in this case, it is inevitable that some characters will be escaped (it terminal do not support full Unicode). You may want to change the settings of console.

Giacomo Catenazzi
  • 8,519
  • 2
  • 24
  • 32