geograpy3 library for extracting the locations in the text, gives UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 276

Question

I am trying to extract location from the text using the geography3 library in python.

import geograpy
address = 'Jersey City New Jersey 07306'
places = geograpy.get_place_context(text = address)

To which i get the below error UnicodeDecodeError:

 ~\Anaconda\lib\site-packages\geograpy\places.py in populate_db(self)
 28         with open(cur_dir + "/data/GeoLite2-City-Locations.csv") as info:
 29             reader = csv.reader(info)
---> 30             for row in reader:
 31                 print(row)
 32                 cur.execute("INSERT INTO cities VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?);", row)

~\Anaconda\lib\encodings\cp1252.py in decode(self, input, final)
 21 class IncrementalDecoder(codecs.IncrementalDecoder):
 22     def decode(self, input, final=False):
---> 23         return 
 codecs.charmap_decode(input,self.errors,decoding_table)[0]
 24 
 25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 276: character maps to <undefined>

After some investigation, i tried to modify the places.py file and added encoding = "utf-8" in the line -----> 30

with open(cur_dir + "/data/GeoLite2-City-Locations.csv", encoding="utf-8") as info:

But it still gives me the same error. I also tried to save the GeoLite2-City-Locations.csv on my Desktop and then tried to read it using the same code.

with open("GeoLite2-City-Locations.csv", encoding="utf-8") as info:
      reader = csv.reader(info)
      for row in reader:
          print(row)

which works absolutely fine and prints all the rows of the GeoLite2-City-Locations.csv. I fail to understand the problem!

score 1 · Answer 1 · answered Sep 09 '20 at 11:26

As a committer of geograpy3 to reproduce your issue i added a test to the most recent geograpy3 https://github.com/somnathrakshit/geograpy3/blob/master/tests/test_extractor.py:

with the result:

['Jersey', 'City'

so you might simply switch to the latest version.

def testStackoverflow54077973(self):
        '''
        see https://stackoverflow.com/questions/54077973/geograpy3-library-for-extracting-the-locations-in-the-text-gives-unicodedecodee
        '''
        address = 'Jersey City New Jersey 07306'
        e=Extractor(text=address)
        e.find_entities()
        self.check(e.places,['Jersey','City'])

score 0 · Answer 2 · answered Jan 24 '19 at 13:38

0

you should specify encoding encoding='utf-8' like you did, although in correct_country_mispelling(self, s) method in places.py (49 row)

answered Jan 24 '19 at 13:38

kek5

11
2

score 0 · Answer 3 · answered Dec 18 '19 at 14:18

After some investigation, this is a Windows vs Linux error in some cases. Even using the

with open(cur_dir + "/data/GeoLite2-City-Locations.csv", encoding="utf-8") as info:

I could not resolve the error on my Windows computer. However, the exact same code ran fine on a Linux computer I use as well. I looked in the the City-Locations.csv file on Linux, and it appeared LibreOffice automatically encoded and/or resolved all the characters. Where as looking at the same file in Excel, I would still have all the funky characters causing the error. Excel for some reason insists on keeping the odd characters.

geograpy3 library for extracting the locations in the text, gives UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 276

3 Answers3