I am trying to load a large json object into memory and then perform some operations with the data. However, I am noticing a large increase in RAM after the json file is read -EVEN AFTER the object is out of scope.
Here is the code
import json
import objgraph
import gc
from memory_profiler import profile
@profile
def open_stuff():
with open("bigjson.json", 'r') as jsonfile:
d= jsonfile.read()
jsonobj = json.loads(d)
objgraph.show_most_common_types()
del jsonobj
del d
print ('d')
gc.collect()
open_stuff()
I tried running this script in Windows with Python version 2.7.12 and Debian 9 with Python version 2.7.13, and I am seeing an issue with the Python in Linux.
In Windows, when I run the script, it uses up a lot of RAM while the json object is being read and in scope (as expected), but it is released after the operation is done (as expected).
list 3039184
dict 413840
function 2200
wrapper_descriptor 1199
builtin_function_or_method 819
method_descriptor 651
tuple 617
weakref 554
getset_descriptor 362
member_descriptor 250
d
Filename: testjson.py
Line # Mem usage Increment Line Contents
================================================
5 16.9 MiB 16.9 MiB @profile
6 def open_stuff():
7 16.9 MiB 0.0 MiB with open("bigjson.json", 'r') as jsonfile:
8 197.9 MiB 181.0 MiB d= jsonfile.read()
9 1393.4 MiB 1195.5 MiB jsonobj = json.loads(d)
10 1397.0 MiB 3.6 MiB objgraph.show_most_common_types()
11 402.8 MiB -994.2 MiB del jsonobj
12 221.8 MiB -181.0 MiB del d
13 221.8 MiB 0.0 MiB print ('d')
14 23.3 MiB -198.5 MiB gc.collect()
However in the LINUX environment, over 500MB of RAM is still used even though all references to the JSON object has been deleted.
list 3039186
dict 413836
function 2336
wrapper_descriptor 1193
builtin_function_or_method 765
method_descriptor 651
tuple 514
weakref 480
property 273
member_descriptor 250
d
Filename: testjson.py
Line # Mem usage Increment Line Contents
================================================
5 14.2 MiB 14.2 MiB @profile
6 def open_stuff():
7 14.2 MiB 0.0 MiB with open("bigjson.json", 'r') as jsonfile:
8 195.1 MiB 181.0 MiB d= jsonfile.read()
9 1466.4 MiB 1271.3 MiB jsonobj = json.loads(d)
10 1466.8 MiB 0.4 MiB objgraph.show_most_common_types()
11 694.8 MiB -772.1 MiB del jsonobj
12 513.8 MiB -181.0 MiB del d
13 513.8 MiB 0.0 MiB print ('d')
14 513.0 MiB -0.8 MiB gc.collect()
The same script run in Debian 9 with Python 3.5.3 uses less RAM but leaks a proportionate amount of RAM.
list 3039266
dict 414638
function 3374
tuple 1254
wrapper_descriptor 1076
weakref 944
builtin_function_or_method 780
method_descriptor 780
getset_descriptor 477
type 431
d
Filename: testjson.py
Line # Mem usage Increment Line Contents
================================================
5 17.2 MiB 17.2 MiB @profile
6 def open_stuff():
7 17.2 MiB 0.0 MiB with open("bigjson.json", 'r') as jsonfile:
8 198.3 MiB 181.1 MiB d= jsonfile.read()
9 1057.7 MiB 859.4 MiB jsonobj = json.loads(d)
10 1058.1 MiB 0.4 MiB objgraph.show_most_common_types()
11 537.5 MiB -520.6 MiB del jsonobj
12 356.5 MiB -181.0 MiB del d
13 356.5 MiB 0.0 MiB print ('d')
14 355.8 MiB -0.8 MiB gc.collect()
What is causing this issue? Both versions of Python are running 64bit versions.
EDIT - calling that function several times in a row leads to even stranger data, the json.loads function uses less RAM each time it's called, after the 3rd try the RAM usage stabilizes, but the earlier leaked RAM does not get released..
list 3039189
dict 413840
function 2339
wrapper_descriptor 1193
builtin_function_or_method 765
method_descriptor 651
tuple 517
weakref 480
property 273
member_descriptor 250
d
Filename: testjson.py
Line # Mem usage Increment Line Contents
================================================
5 14.5 MiB 14.5 MiB @profile
6 def open_stuff():
7 14.5 MiB 0.0 MiB with open("bigjson.json", 'r') as jsonfile:
8 195.4 MiB 180.9 MiB d= jsonfile.read()
9 1466.5 MiB 1271.1 MiB jsonobj = json.loads(d)
10 1466.9 MiB 0.4 MiB objgraph.show_most_common_types()
11 694.8 MiB -772.1 MiB del jsonobj
12 513.9 MiB -181.0 MiB del d
13 513.9 MiB 0.0 MiB print ('d')
14 513.1 MiB -0.8 MiB gc.collect()
list 3039189
dict 413842
function 2339
wrapper_descriptor 1202
builtin_function_or_method 765
method_descriptor 651
tuple 517
weakref 482
property 273
member_descriptor 253
d
Filename: testjson.py
Line # Mem usage Increment Line Contents
================================================
5 513.1 MiB 513.1 MiB @profile
6 def open_stuff():
7 513.1 MiB 0.0 MiB with open("bigjson.json", 'r') as jsonfile:
8 513.1 MiB 0.0 MiB d= jsonfile.read()
9 1466.8 MiB 953.7 MiB jsonobj = json.loads(d)
10 1493.3 MiB 26.6 MiB objgraph.show_most_common_types()
11 723.9 MiB -769.4 MiB del jsonobj
12 723.9 MiB 0.0 MiB del d
13 723.9 MiB 0.0 MiB print ('d')
14 722.4 MiB -1.5 MiB gc.collect()
list 3039189
dict 413842
function 2339
wrapper_descriptor 1202
builtin_function_or_method 765
method_descriptor 651
tuple 517
weakref 482
property 273
member_descriptor 253
d
Filename: testjson.py
Line # Mem usage Increment Line Contents
================================================
5 722.4 MiB 722.4 MiB @profile
6 def open_stuff():
7 722.4 MiB 0.0 MiB with open("bigjson.json", 'r') as jsonfile:
8 722.4 MiB 0.0 MiB d= jsonfile.read()
9 1493.1 MiB 770.8 MiB jsonobj = json.loads(d)
10 1493.4 MiB 0.3 MiB objgraph.show_most_common_types()
11 724.4 MiB -769.0 MiB del jsonobj
12 724.4 MiB 0.0 MiB del d
13 724.4 MiB 0.0 MiB print ('d')
14 722.9 MiB -1.5 MiB gc.collect()
Filename: testjson.py
Line # Mem usage Increment Line Contents
================================================
17 14.2 MiB 14.2 MiB @profile
18 def wow():
19 513.1 MiB 498.9 MiB open_stuff()
20 722.4 MiB 209.3 MiB open_stuff()
21 722.9 MiB 0.6 MiB open_stuff()
EDIT 2: Someone suggested this is a duplicate of Why does my program's memory not release? , but the amount of memory in question is far from the "small pages" discussed in the other question.