1

I am trying to join a python list of strings into a single string with '\n'.join(self.product.features) that I can save into a file. The list looks like this;

[
  "【SIX IN ONE】This digital radio alarm clock combines 6 functions: Digital clock 12/24 Hour Format for checking time; dual alarm clock with individual alarm volume control for awaking you up and you can adjust the alarm volume level; FM radio for listening to news& weather forecast; auto brightness & 3 steps dimmer control for eyes care; USB charging port for mobile device and easy to charge your phone near bedside; 3.5 mm jack (not included) for external audio source.",
  "【LARGE BRIGHT DISPLAY】Large 1.4-inch Cyan Blue LED display without any blink makes time easy to read from a far distance, auto brightness & 3 steps dimmer control for eyes caring, auto set the display to a brighter setting in daytime and softer one at night.",
  "【AUTO TIME SET】 Once you plugged this radio alarm clock to the AC Outlet, default EST time will be displayed. DST (Daylight Saving Time) will be switching automatically, Simple Time Zone Alignment (Press and hold SET button then adjust the Hour by Tune Up or Down), Backup Battery to maintain Time and Alarm Setting.",
  "【SUITABLE FOR HOME&OFFICE】You can place this radio alarm clock on bedside table; Office desk; kitchen; study table or cabinet at sitting room - Need to connect to main power.",
  "【30 DAYS MONEY BACK GUARANTEE】Please feel free to contact us if you have any questions on this radio alarm clock and you can buy with confidence at any time thanks to our 30-day money back guarantee."
]

This is my code which attempts to join the string and save it;

txtfile = open(self.productDir + "\\Product Details.txt", "w+")
...
txtfile.write("\n\n")
if (self.product.features and len(self.product.features) > 0):
    txtfile.write('\n'.join(self.product.features))
else:
    txtfile.write('Has no features')
...
txtfile.close()

But I am getting the error;

UnicodeEncodeError: 'charmap' codec can't encode character '\u3010' in position 0: character maps to <undefined>

I can see that some characters were not able to be decoded I am just not sure how to use it in this case to decode/encode or bypass it.

jwodder
  • 54,758
  • 12
  • 108
  • 124
  • 1
    Just set encoding name in open function (e.g `utf-8` ), by default python use system encoding – Kirill Ermolov Oct 16 '17 at 13:03
  • Yes that worked `txtfile = open(self.productDir+"\\Product Details.txt", "w+", encoding="utf-8")` –  Oct 16 '17 at 13:09
  • Also I recommend to use `with` construction, it is not connect to your question, however code will be more clear. - http://www.pythonforbeginners.com/files/with-statement-in-python – Kirill Ermolov Oct 16 '17 at 13:09
  • Possible duplicate of [UnicodeEncodeError: 'charmap' codec can't encode characters](https://stackoverflow.com/questions/27092833/unicodeencodeerror-charmap-codec-cant-encode-characters) – Harman Oct 16 '17 at 13:15
  • @Harman: the cause is not far from the one of the proposed duplicate, but the solution may be different. – Serge Ballesta Oct 16 '17 at 13:32

1 Answers1

2

As you wrote your path with backslashes (\), I assume that you are using Windows. In Windows command line interface (the so called consoles), the encoding is currently cp1252 (or an other 8 bit encoding). The character U+3010 (LEFT BLACK LENTICULAR BRACKET) does not exist in cp1252 charset (and probably does not exits either in any common 8bits charset in Windows).

Possible workarounds:

  • convert forth and back ignoring unmappable characters - parens will disappear...

    charset = 'ascii'
    txt = '\n'.join(self.product.features).encode(charset, 'ignore').decode(charset)
    txtfile.write(txt)
    
  • replace offending characters with a sensible equivalent - parens will appear as ( and ) but will break if other non mappables chars exist is the string

    txt = '\n'.join(self.product.features).replace('\u3010', '(').replace('\u3011', ')')
    txtfile.write(txt)
    

IMHO the best to do in to combine both, optionnaly using cp1252 charset if your system uses it:

    txt = '\n'.join(self.product.features).replace('\u3010', '(').replace('\u3011', ')')
    charset = 'cp1252'
    txtfile.write(txt.encode(charset, 'ignore').decode(charset))
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252