0

I did the following test and was confused by the memory usage at runtime:

test.py:

from func import run

# raw_input() is for checking runtime memory change
raw_input()
run()
raw_input()

func.py:

import pandas as pd
import numpy as np

LARGEDF = pd.DataFrame(np.random.rand(50000, 1000))

def f():
    return LARGEDF.copy()

def run():
    res = f()

When I run python test.py, before I press enter for the first time, the memory already comes to 435,600KB. After I press enter the memory usage just stays at that level.

However, if I put the definition of LARGEDF in function f(), you could see that after you press enter memory goes up from 44,000KB to 435,600KB and then immediately comes down to 44,000KB again, which is expected, due to the release of the memory allocated for the local variable LARGEDF.

So my question is: from func import run should only import function run(). Why am I getting the memory of LARGEDF pre-allocated while that variable is still inaccessible in test.py (I checked in globals())?

Can I conclude this to be one of the bad things defining large objects globally in a module leads to?

Note: if I comment return LARGEDF.copy() and put pass there, the size of this large dataframe is still added to the runtime python memory.

Thanks!

0 Answers0