-2

I'm importing a json file in python, but the file is full with accent characters in the city name (from portuguese language) and I need to somehow remove then from this file to further use. For example, the words 'São Paulo', 'Santo André' and 'Foz do Iguaçu' should become in the json: Sao Paulo, Santo Andre and Foz do Iguacu.

    { "type": "FeatureCollection", "features": [ 
        { "type": "Feature", "properties": {"id": "1100015", "name": "São Paulo", "description": "Alta Floresta D'Oeste"}, "geometry": { "type": "Polygon", "coordinates": [-62.1820888570, -11.8668597878] }},
        { "type": "Feature", "properties": {"id": "1100023", "name": "Santo André", "description": "Ariquemes"}, "geometry": { "type": "Polygon", "coordinates": [-62.5359497334, -9.7318235272] }},
        { "type": "Feature", "properties": {"id": "1100031", "name": "Foz do Iguaçu", "description": "Cabixi"}, "geometry": { "type": "Polygon", "coordinates": [-60.3993982597, -13.4558418276] }}
}
yyz_vanvlet
  • 171
  • 2
  • 8
  • 3
    Why do they need to be removed? Python handles Unicode well and accents are part of the language. Just wondering if this is an XY problem. – Mark Tolonen Nov 13 '20 at 17:36

2 Answers2

1

Use unidecode :)

import unidecode
import json

places_json =      '''
        { "type": "FeatureCollection", 
        "features": [ 
        { "type": "Feature", "properties": {"id": "1100015", "name": "São Paulo", "description": "Alta Floresta D'Oeste"}, "geometry": { "type": "Polygon", "coordinates": [-62.1820888570, -11.8668597878] }},
        { "type": "Feature", "properties": {"id": "1100023", "name": "Santo André", "description": "Ariquemes"}, "geometry": { "type": "Polygon", "coordinates": [-62.5359497334, -9.7318235272] }},
        { "type": "Feature", "properties": {"id": "1100031", "name": "Foz do Iguaçu", "description": "Cabixi"}, "geometry": { "type": "Polygon", "coordinates": [-60.3993982597, -13.4558418276] }}
                    ]
        }
        '''
json_dec = unidecode.unidecode(places_json)
print(json.loads(json_dec))
Alexander Riedel
  • 1,329
  • 1
  • 7
  • 14
0

@alexander-riedel has the right idea, but I think the wrong implementation because you have json, and you shouldn't convert the whole thing to a string.

Instead loop through the keys, converting them individually. It looks like it's only names that need converting, so you can do:

from unidecode import unidecode

data = { "type": "FeatureCollection", "features": [ 
    { "type": "Feature", "properties": {"id": "1100015", "name": "São Paulo", "description": "Alta Floresta D'Oeste"}, "geometry": { "type": "Polygon", "coordinates": [-62.1820888570, -11.8668597878] }},
    { "type": "Feature", "properties": {"id": "1100023", "name": "Santo André", "description": "Ariquemes"}, "geometry": { "type": "Polygon", "coordinates": [-62.5359497334, -9.7318235272] }},
    { "type": "Feature", "properties": {"id": "1100031", "name": "Foz do Iguaçu", "description": "Cabixi"}, "geometry": { "type": "Polygon", "coordinates": [-60.3993982597, -13.4558418276] }}
}

# modify in place
for feature in data["features"]:
    feature["properties"]["name"] = unidecode(feature["properties"]["name"])
blueteeth
  • 3,330
  • 1
  • 13
  • 23