I need to convert a lot of large files from JSON to CVS format, I did it this way before:
import csv
import json
f = open('test.json')
data = json.load(f)
f.close()
f = csv.writer(open('test.csv', 'wb+'))
for x in data:
f.writerow([x["host"],
x["port"]])
It works if the format is correctly maintained, but now I have the following problems:
- My one file can be about 500 gigabytes.
Is there any way to optimize my code, make it faster?
- The main problem is that this fairly large file will contain a not quite correct JSON format, i.e.
{"data":"abc","host":"12.34.56.78","path":"/","port":123}
{"data":"abc","host":"12.34.56.78","path":"/","port":123}
{"data":"abc","host":"12.34.56.78","path":"/","port":123}
...
as you can see there is missing [ at the beginning and ] at the end of a file, "," at the end of the lines - these are the factors that do not allow to convert the file correctly at the moment.
I know that it is possible to open a file first, to add at the end of each line "," and also to add [ and ] - but it is additional operations with a file of the big size, especially if there are many of them can reduce performance. Or is it somehow possible to do it in my case without losing efficiency and with minimal calls to the file?
Maybe someone can suggest the most effective solution?