I have a directionary of json files to read, so I use the following code:
test_filelist = os.listdir('myDir')
df_test_list = [pd.read_json( os.path.join('myDir',file),lines=True ) for file in test_filelist if file.endswith('json') ]
df_test = pd.concat(df_test_list)
The total size of my directionary is 4.5G, but when I use top
to check the memory that my process use, I find that this process use 30G when the read was done.
Why this happen? I only read 4.5G json files, but 30G memory had been used, how to avoid this ?
I printed the df_test.info()
, it told me that this dataframe only use 177.7 MB memory, why?