0

I want to read a 20 MB JSON file, filter out some content, and write back to a JSON file in order to make the file smaller. The file's charset is us-ascii.

I came up with the following Python code. But special characters like é or ö end up being written as \u00e9 and similar. I've tried decode() and json.dumps but I am not experienced in working with de and encoding function.

I would be grateful if somebody could give me a hint on what to do.

Is there maybe a better way of doing this? Doesn't have to be Python. I just thought doing it with Python is the easiest. I am working in Python 2.7 environment.

import json

new_arr = []

with open('airports.json', 'r') as data_file:    
    data = json.load(data_file)

for i in data:
    if i['type'] != 'heliport' and i['type'] != 'closed':
        new_arr.append(i)

f = open('airports_withoutheliports.json', 'w')
json.dump(new_arr, f, indent=1)
f.close();

Edit: Here is one entry of the Json file:

{
"id":45229,
"ident":"AMC",
"type":"large_airport",
"name":"Mar de Cortés International Airport",
"latitude_deg":31.351621252,
"longitude_deg":-113.305864334,
"elevation_ft":71,
"continent":"NA",
"iso_country":"MX",
"iso_region":"MX-SON",
"municipality":"Puerto Peñasco",
"scheduled_service":"yes",
"gps_code":"",
"iata_code":"",
"local_code":"AMC",
"home_link":"",
"wikipedia_link":"http://en.wikipedia.org/wiki/Mar_de_Cort%C3%A9s_International_Airport",
"keywords":""}

Edit 2 and solution:

Sorry for the duplicate: Though I don't think it is a proper duplicate. My problem was slightly different as I had to read and write from and to file. I had seen Martijn Pieters answer, but I could not put things together. Eventually it bore the solution to my problem. When writing to file you need io.open() instead of open() and you have to set ensure_ascii flag to False. In my case using utf8 or latin-1 didnt matter. Here is the working code:

import json, io

array = []

with open("airports.json", 'r', encoding='utf8') as json_file_in:
    json_data = json.load(json_file_in)

for i in json_data:
    if i['type'] != 'heliport' and i['type'] != 'closed':
        array.append(i)

with io.open('airports_withoutheliports.json', 'w', encoding="utf8") as json_file_out:
    json.dump(array, json_file_out, ensure_ascii=False, indent=1)
haffla
  • 1,056
  • 8
  • 15
  • If the file has got characters like `é`in it, it can't possibly be `ascii`.`é` is a non-ASCII character, and therefore can't be encoded in the `ascii` character set (as Python will tell you if you try it). Chances are its encoding is `iso-8859-1` (also known as `latin1`) or `utf-8`. – Lukas Graf Jul 12 '14 at 11:47
  • If you can post part of the json data, that may be helpful. – parchment Jul 12 '14 at 11:48
  • The reason I said that the file's charset is ascii is because file -i airports.json yielded exactly that. – haffla Jul 13 '14 at 12:38

0 Answers0