1

I can't figure out why geojson saves data in that format (UTF16?) so that "François" ends up as "Fran\u00e7ois" when calling geojson.dump() to save data back to disk.

Any idea?

import geojson
from geojson import LineString, Point, Feature, FeatureCollection, dump
from geopy.geocoders import Nominatim

with open('input.geojson', encoding='utf-8') as f:
    gj = geojson.load(f)

for track in gj['features']:
    #NO DIFF with open(track['properties']['name'][0] + '.geojson', 'a+', encoding='utf-8') as f:
    with open(track['properties']['name'][0] + '.geojson', 'a+') as f:
        dump(track, f, indent=2)

        #UnicodeEncodeError: 'charmap' codec can't encode character '\u2194' in position 7: character maps to <undefined>
        #dump(track, f, indent=2, ensure_ascii=False)

        #NOT DEFINED
        #dumps(track, f, indent=2)

        #AttributeError: encode
        #dump(track.encode("utf-8"), f, indent=2, ensure_ascii=False)

Thank you.

--

Edit :

I followed the other thread, and tried several things, but still cannot maintain the original text.

        INPUTFILE = 'input.geojson'

        with open(INPUTFILE, encoding='utf-8') as f:
            gj = geojson.load(f)

        for track in gj['features']:
            #Bad : \u00e0 instead of "à"
            ##with open(track['properties']['name'][0] + '.geojson', 'a+') as f:
                ##dump(track, f, indent=2)

            #Bad : "à" instead of "à"
        ##    with io.open(track['properties']['name'][0] + '.geojson', 'a+', encoding='utf8') as json_file:
        ##        dump(track, json_file, ensure_ascii=False)

        ##    with io.open(track['properties']['name'][0] + '.geojson', 'a+', encoding='utf8') as json_file:
        ##        #NameError: name 'dumps' is not defined
        ##        data = dumps(track, ensure_ascii=False)
        ##        # unicode(data) auto-decodes data to unicode if str
        ##        json_file.write(unicode(data))

            #Bad : "à" instead of "à"
        ##    with codecs.open(track['properties']['name'][0] + '.geojson', 'a+', encoding='utf-8') as f:
        ##        dump(track, f, ensure_ascii=False)

    ##    with codecs.open(track['properties']['name'][0] + '.geojson', 'a+', 'utf-8') as fp:
    ##        #NameError: name 'dumps' is not defined
    ##        fp.write(dumps(track, ensure_ascii=False))

    #Bad : "à" instead of "à"
##    with io.open(track['properties']['name'][0] + '.geojson', 'a+', encoding='utf8') as json_file:
##        json.dump(track, json_file, ensure_ascii=False)

            break
Gulbahar
  • 5,343
  • 20
  • 70
  • 93
  • Possible duplicate of [Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence](https://stackoverflow.com/questions/18337407/saving-utf-8-texts-in-json-dumps-as-utf8-not-as-u-escape-sequence) – Anto Jurković Aug 29 '18 at 10:33
  • JSON allows for both escaped or non-escaped non-ascii characters, and since `dump` is a wrapper around the standard json function, they are stored this way. Are these files going to be read by humans? If not, is there a reason for you to care about the way it's stored? – Chillie Aug 29 '18 at 10:33
  • Yes, I'd rather have readable data. – Gulbahar Aug 29 '18 at 16:45

0 Answers0