I want to read a 20 MB JSON file, filter out some content, and write back to a JSON file in order to make the file smaller. The file's charset is us-ascii.
I came up with the following Python code. But special characters like é
or ö
end up being written as \u00e9
and similar. I've tried decode()
and json.dumps
but I am not experienced in working with de and encoding function.
I would be grateful if somebody could give me a hint on what to do.
Is there maybe a better way of doing this? Doesn't have to be Python. I just thought doing it with Python is the easiest. I am working in Python 2.7 environment.
import json
new_arr = []
with open('airports.json', 'r') as data_file:
data = json.load(data_file)
for i in data:
if i['type'] != 'heliport' and i['type'] != 'closed':
new_arr.append(i)
f = open('airports_withoutheliports.json', 'w')
json.dump(new_arr, f, indent=1)
f.close();
Edit: Here is one entry of the Json file:
{
"id":45229,
"ident":"AMC",
"type":"large_airport",
"name":"Mar de Cortés International Airport",
"latitude_deg":31.351621252,
"longitude_deg":-113.305864334,
"elevation_ft":71,
"continent":"NA",
"iso_country":"MX",
"iso_region":"MX-SON",
"municipality":"Puerto Peñasco",
"scheduled_service":"yes",
"gps_code":"",
"iata_code":"",
"local_code":"AMC",
"home_link":"",
"wikipedia_link":"http://en.wikipedia.org/wiki/Mar_de_Cort%C3%A9s_International_Airport",
"keywords":""}
Edit 2 and solution:
Sorry for the duplicate: Though I don't think it is a proper duplicate. My problem was slightly different as I had to read and write from and to file. I had seen Martijn Pieters answer, but I could not put things together. Eventually it bore the solution to my problem. When writing to file you need io.open() instead of open() and you have to set ensure_ascii flag to False. In my case using utf8 or latin-1 didnt matter. Here is the working code:
import json, io
array = []
with open("airports.json", 'r', encoding='utf8') as json_file_in:
json_data = json.load(json_file_in)
for i in json_data:
if i['type'] != 'heliport' and i['type'] != 'closed':
array.append(i)
with io.open('airports_withoutheliports.json', 'w', encoding="utf8") as json_file_out:
json.dump(array, json_file_out, ensure_ascii=False, indent=1)