0

I am generating a Python dictionary as follows:

placedict = {
   "id": geonames.geonames_id,
   "info": json.dumps(jsoninfo),
}

where id is a string and info a valid and readable JSON string:

'{"geonamesurl": "http://geonames.org/310859/kahramanmara\\u015f.html", "searchstring": "Kahramanmara\\u015f", "place": "Kahramanmara\\u015f", "confidence": 1, "typecode": "PPLA", "toponym": "Kahramanmara\\u015f", "geoid": 310859, "continent": "AS", "country": "Turkey", "state": "Kahramanmara\\u015f", "region": "Kahramanmara\\u015f", "lat": "37.5847", "long": "36.92641", "population": 376045, "bbox": {"northeast": [37.66426194452945, 37.02690583904019], "southwest": [37.50514805547055, 36.825904160959816]}, "timezone": "Europe/Istanbul", "wikipedia": "en.wikipedia.org/wiki/Kahramanmara%C5%9F", "hyerlist": ["part-of: Earth GeoID: 6295630 GeoCode: AREA", "part-of: Asia GeoID: 6255147 GeoCode: CONT", "part-of: Turkey GeoID: 298795 GeoCode: PCLI", "part-of: Kahramanmara\\u015f GeoID: 310858 GeoCode: ADM1", "part-of: Kahramanmara\\u015f GeoID: 310859 GeoCode: PPLA"], "childlist": ["Aksu", "Barbaros", "Egemenlik"]}'

but as you can see while the jsoninfo variable holds valid utf-8 chars, the placedict['info'] chars are not utf-8 encoded but rather escaped. I therefore tried to change the json.dumps line to:

placedict = {
            "id": geonames.geonames_id,
            "info": json.dumps(jsoninfo).encode("utf-8"),
        }

or even

placedict = {
            "id": geonames.geonames_id,
            "info": json.dumps(jsoninfo, ensure_ascii=False).encode("utf-8"),
        }

hoping this would encode the JSON as desired, but I see that after either of these modifications, the 'info" member of the dictionary returns as b'.........' and therefore find a binary string in MongoDB.

I want to store the dictionary with an utf-8 encoded readable JSON string in MongoDB.

Where am I making a mistake?

Robert Alexander
  • 875
  • 9
  • 24
  • 1
    It's the `.encode(...)` call that *encodes* the text to bytes. Just get rid of that. You just want the `ensure_ascii=False` option. – deceze Feb 08 '23 at 12:58
  • 1
    If you're saving this to Mongo, I'd question why you're dumping to JSON manually though. Don't you want to save an object in an object in Mongo? Why store the dict as string in a dict? – deceze Feb 08 '23 at 13:00
  • @deceze very first minutes working with MongoDB and a noSQL db in my whole life and very probably doing it wrong :) Will try to study and understand what you're suggesting. Any tps? – Robert Alexander Feb 08 '23 at 13:07

1 Answers1

2

You might use just json.dumps with ensure_ascii=False

import json
jsoninfo = {"El":"Niño"}
info = json.dumps(jsoninfo, ensure_ascii=False)
print(info)  # {"El": "Niño"}
Daweo
  • 31,313
  • 3
  • 12
  • 25