0

For the life of me, I cant figure out what I am doing wrong

import urllib
import csv

with open("mydb.txt", 'rb') as f:
    readr = csv.reader(f, delimiter = ",", quotechar="'")
    for row in readr:
        mylist = []
        for i in row:
            code=urllib.unquote(i)
            mylist.append(code)
        print mylist

the problem is I keep getting things like:

['S\xc3\xa3o Desid\xc3\xa9rio', 'BA', 'Convencional', '1759', '-12.52332', '-45.59509']

What is this 'S\xc3\xa3o Desid\xc3\xa9rio' ? it should be São Desidário. How can I fix it?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
relima
  • 3,462
  • 5
  • 34
  • 53

1 Answers1

0
  1. You are printing the list object instead of its members. Trying using str.join to format the list to your liking.

  2. You should consider the "unicode sandwich" approach ("bytes on the outside, unicode on the inside"). Convert all of your input to unicode immediately upon input, and convert it to encoded bytes on output.

This program might suit you:

import urllib
import csv

with open("mydb.txt", 'rb') as f:
    readr = csv.reader(f, delimiter = ",", quotechar="'")
    for row in readr:
        mylist = []
        for i in row:
            i = i.decode('utf-8')
            code=urllib.unquote(i)
            mylist.append(code)
            print type(code),code
        print u','.join(mylist).encode('utf-8')
Robᵩ
  • 163,533
  • 20
  • 239
  • 308