I've 2 json files of size data_large(150.1mb)
and data_small(7.5kb)
. The content inside each file is of type [{"score": 68},{"score": 78}]
. I need to find the list of unique scores from each file.
While dealing with data_small, I did the following and I was able to view its content with 0.1 secs
.
with open('data_small') as f:
content = json.load(f)
print content # I'll be applying the logic to find the unique values later.
But while dealing with data_large, I did the following and my system got hanged, slow and had to force shut-it down to bring it into its normal speed. It took around 2 mins
to print its content.
with open('data_large') as f:
content = json.load(f)
print content # I'll be applying the logic to find the unique values later.
How can I increase the efficiency of the program while dealing with large data-sets?