0

I know there are some questions about this topic but I couldn't get the answer I'm looking for. So I'll ask it anyway. I'm beginner :)

I have this simple function :

f =[]
def extract_row():
    with open('country_codes.txt') as infile:
        for line in infile:
            x = (line.split()[0])
            f.append(x)
        print (f)
extract_row()

It runs on python 2.7, so I could get the information I needed.

['AD', 'AE', 'AF', 'AG', 'AI', 'AL', 'AM', 'AN', 'AO', 'AQ', 'AR'...

But when I try to run it on python 3.4 I get this error :

Traceback (most recent call last):
  File "/Users/juanlozano/Documents/geonames/extractRow.py", line 8, in <module>   
    extract_row()
  File "/Users/juanlozano/Documents/geonames/extractRow.py", line 4, in extract_row
    for line in infile:
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position     31: ordinal not in range(128). 

Does anyone could give some information about it?

These are some lines from the txt file I'm using : enter image description here

Andrea Corbellini
  • 17,339
  • 3
  • 53
  • 69
Juanloz
  • 17
  • 1
  • 5

2 Answers2

0

I OCR'ed your image in Google Drive. Not perfect but good enough to replicate:

AD AND 20 AN Andorra Andorra la Vella 468. 0 84 EU
AE ARE 784 AE United Arab Emirates Abu Dhabi 82,880.0 4,975, 593 AS
AF AFG 4 AF Afghanistan Kabul 647, 500.0 29, 121,286 AS
AG ATG 28 AC Antigua and Barbuda St. John's 443.0 86,754 NA
AI AIA 660 AV Anguilla The Valley 102.0 13, 254 NA
ALE 8 AL Albania Tirana 28,748,0 2,986, 952 EU
ARM 51 AM Armenia Yerevan 29,800.0 2,968,000 AS
ANT 530 NT Willemstad 960. 0 136, 197 NA 24 A0 Angola Luanda 1,246,700.0 13,068,161 AF
AQ 10 AY Antarctica 14,000,000.0 0 AN
AR B2 AR Argentina Buenos Aires 2,766, 890. 0 41,343, 201 SA
AS 16 AQ American Samoa Pago Pago 199.0 57,881 0C
AT 40 AU Austria Vienna 83,858.0 8,205,000 EU
AU AUS 36 AS Australia Canberra 7,686,850.0 21,515,754 OC
AW AA Aruba Oranjestad 193.0 71,566 NA
AX Åland Mariehamn 1,580.0 26,711 EU
AZ AJ Azerbaijan Baku 86,600.0 8,303,512 AS
BA BK Bosnia and Herzegovina Sarajevo 51, 129.0 4,590,000 EU
BB BB Barbados Bridgetown 431. 0 285,653 NA
BD BG Bangladesh Dhaka 144,000.0 156,118,464 AS
BE BE Belgium Brussels 30,510.0 10,403,000 EU
BF UV Burkina Faso Ouagadougou 274,200.0 16, 241, 811 AF
BG BU Bulgaria Sofia 110,910.0 7, 148,785 EU
BH BA Bahrain Manama | 665.0 738,004 AS
BI BY Burundi Bujumbura 27,830.0 9,863, 117 AF
BJ EN Benin Porto-Novo 112,620.0 9,056,010 AF
BL TB Saint Barthélemy Gustavia 21. 0 8, 45 NA
EM BD Bermuda Hamilton 53.0 65,365 NA
BN BX Brunei Bandar Seri Begawan 5,770.0 395,027 AS
B0 BL Bolivia Sucre 1,098,580,0 9,947, 418 SA
BQ Bonaire_328.0 18,012 NA

I then entered your code with the addition of encoding='ascii' as shown below:

f =[]
def extract_row():
    with open('country_codes.txt',encoding='ascii') as infile:
         for line in infile:
             x = (line.split()[0])
             f.append(x)
         print (f)

extract_row()

And got the error UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 763: ordinal not in range(128).

I therefore conclude that Python for some reason is thinking that your source file is encoded ascii. Check this first by running sys.getdefaultencoding(). Do you know the proper encoding of your source file? Try changing the encoding in the open file line (for example, to encoding=utf-8 or iso8859 as suggested above) and see if that helps.

solurker
  • 109
  • 4
  • I wonder if python 3 is picking-up the `site.py` from the python 2 installation? Additional background: http://stackoverflow.com/questions/2276200/changing-default-encoding-of-python – cdarke Feb 20 '16 at 22:56
0

use codecs lib to solve this question. replace your read file code segment with this one:

with codecs.open('country_codes.txt','r','utf-8') as infile:

  • 2
    Since the question states python 3.4, the codecs library is not needed in this case as the `open` call has an `encoding` [option](https://docs.python.org/3.4/library/functions.html#open) – Featherlegs May 09 '17 at 15:26