3

I'm trying to make a function which prints to the command prompt and to a file. I get encoding/decoding errors with the following code:

import os

def pas(stringToProcess): #printAndSave
  print stringToProcess 
  try: f = open('file', 'a')
  except: f = open('file', 'wb')
  print  >> f, stringToProcess
  f.close()

all = {u'title': u'Pi\xf1ata', u'albumname': u'New Clear War {EP}', u'artistname': u'Montgomery'}

pas(all['title'])

I get the following output:

Piñata
Traceback (most recent call last):
  File "new.py", line 17, in <module>
     pas(all['title'])
  File "new.py", line 11, in pas
    print  >> f, stringToProcess
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 2: ordinal not in range(128)

I've tried all the encode()/decode() permutations I can imagine from similar answers on here, without success. How can this error be solved?

stretch
  • 211
  • 1
  • 2
  • 10
  • 1
    Why are you opening the file in textmode when appending, but in binary mode if an exception was thrown? Not that it matters here, you are not writing newlines, and opening the file in append mode will either work or the opening of the file in `'wb'` mode will fail for the same reasons opening in append mode fails. – Martijn Pieters Dec 30 '14 at 10:35
  • 1
    And `all['title'].encode('utf8')` would work just fine. What *did* you try? – Martijn Pieters Dec 30 '14 at 10:36
  • 1
    I think he might be trying to append to a file if it exists, otherwise create it. – Burhan Khalid Dec 30 '14 at 10:38
  • I tried all sorts of codecs with errors='replace' and errors='ignore' but I think I was encoding strings that were already encoded, as I had these issues earlier. Once I removed all traces of encode/decode apart from the one in the function, it worked. That is correct Burhan Khalid. – stretch Dec 30 '14 at 11:14

3 Answers3

3

As someone commented, you probably just need to specify which codec to use when writing the string. E.g., this works for me:

def pas(s):
    print(s)
    with open("file", "at") as f:
        f.write("%s\n" % s.encode("utf-8"))

pas(u'Pi\xf1ata')
pas(u'Pi\xf1ata')

As you can see, I specifically open the file in append/text mode. If the file doesn't exist, it will be created. I also use with instead of your try-except method. This is merely the style I prefer.

As Bhargav says, you can also set the default encoding. It all depends on how much control you need in your program and both ways are fine.

csl
  • 10,937
  • 5
  • 57
  • 89
  • 1
    An easier way to do this is is to use [`'codecs.open()`](https://docs.python.org/2/library/codecs.html?highlight=open#codecs.open) because the encoding/decoding of any data written/read is done automatically. – martineau Dec 30 '14 at 12:46
  • @martineau I actually didn't know about it! Does it default to UTF-8 the first time the file is created? Thanks. – csl Jan 01 '15 at 20:30
  • The [docs](https://docs.python.org/3/library/codecs.html?highlight=open#codecs.open) don't mention a default encoding, although the argument is optional...so I would assume it doesn't do any encoding for you if it's not specified (although that would make it just like a regular `open()`). – martineau Jan 01 '15 at 21:01
3

Use sys.setdefaultencoding('utf8') to prevent the error from occuring.

That is

import os,sys
reload(sys)  
sys.setdefaultencoding('utf8')
def pas(stringToProcess): #printAndSave
  print stringToProcess 
  try: f = open('file', 'a')
  except: f = open('file', 'wb')
  print  >> f, stringToProcess
  f.close()

all = {u'title': u'Pi\xf1ata', u'albumname': u'New Clear War {EP}', u'artistname': u'Montgomery'}

pas(all['title'])

This would print

Piñata
Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
1

I've just done this and it works, I read an interesting question.

Encoding is always a bit tricky :

def pas(stringToProcess): #printAndSave
    strtp = stringToProcess.encode('utf-8')
    print stringToProcess
    try: f = open('file.txt', 'a')
    except: f = open('file.txt', 'wb')
    f.write(strtp)
    f.close()

all = {u'title': u'Pi\xf1ata', u'albumname': u'New Clear War {EP}', u'artistname': u'Montgomery'}

pas(all['title'])
Community
  • 1
  • 1
Bestasttung
  • 2,388
  • 4
  • 22
  • 34