2

I am trying to extract location from the text using the geography3 library in python.

import geograpy
address = 'Jersey City New Jersey 07306'
places = geograpy.get_place_context(text = address)

To which i get the below error UnicodeDecodeError:

 ~\Anaconda\lib\site-packages\geograpy\places.py in populate_db(self)
 28         with open(cur_dir + "/data/GeoLite2-City-Locations.csv") as info:
 29             reader = csv.reader(info)
---> 30             for row in reader:
 31                 print(row)
 32                 cur.execute("INSERT INTO cities VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?);", row)

~\Anaconda\lib\encodings\cp1252.py in decode(self, input, final)
 21 class IncrementalDecoder(codecs.IncrementalDecoder):
 22     def decode(self, input, final=False):
---> 23         return 
 codecs.charmap_decode(input,self.errors,decoding_table)[0]
 24 
 25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 276: character maps to <undefined>

After some investigation, i tried to modify the places.py file and added encoding = "utf-8" in the line -----> 30

with open(cur_dir + "/data/GeoLite2-City-Locations.csv", encoding="utf-8") as info:

But it still gives me the same error. I also tried to save the GeoLite2-City-Locations.csv on my Desktop and then tried to read it using the same code.

with open("GeoLite2-City-Locations.csv", encoding="utf-8") as info:
      reader = csv.reader(info)
      for row in reader:
          print(row)

which works absolutely fine and prints all the rows of the GeoLite2-City-Locations.csv. I fail to understand the problem!

Wolfgang Fahl
  • 15,016
  • 11
  • 93
  • 186

3 Answers3

1

As a committer of geograpy3 to reproduce your issue i added a test to the most recent geograpy3 https://github.com/somnathrakshit/geograpy3/blob/master/tests/test_extractor.py:

with the result:

['Jersey', 'City'

so you might simply switch to the latest version.

def testStackoverflow54077973(self):
        '''
        see https://stackoverflow.com/questions/54077973/geograpy3-library-for-extracting-the-locations-in-the-text-gives-unicodedecodee
        '''
        address = 'Jersey City New Jersey 07306'
        e=Extractor(text=address)
        e.find_entities()
        self.check(e.places,['Jersey','City'])
Wolfgang Fahl
  • 15,016
  • 11
  • 93
  • 186
0

you should specify encoding encoding='utf-8' like you did, although in correct_country_mispelling(self, s) method in places.py (49 row)

kek5
  • 11
  • 2
0

After some investigation, this is a Windows vs Linux error in some cases. Even using the

with open(cur_dir + "/data/GeoLite2-City-Locations.csv", encoding="utf-8") as info:

I could not resolve the error on my Windows computer. However, the exact same code ran fine on a Linux computer I use as well. I looked in the the City-Locations.csv file on Linux, and it appeared LibreOffice automatically encoded and/or resolved all the characters. Where as looking at the same file in Excel, I would still have all the funky characters causing the error. Excel for some reason insists on keeping the odd characters.

Sam Dean
  • 379
  • 9
  • 19