I am trying to process a huge csv file with panda Firstly, i come across a memory error when loading the file. I am able to fix to by this:
df = pd.read_csv('data.csv', chunksize=1000, low_memory=False)
device_data = pd.concat(df, ignore_index=True)
However, I still get memory errors when processing the "device_data" with multiple filters
Here are my questions: 1- Is there any way to get rid of memory errors when processing the dataframe loaded from that huge csv?
2- I have also tried adding conditions to concatenate dataframe with the iterators. Referring to this link [How can I filter lines on load in Pandas read_csv function?
iter_csv = pd.read_csv('data.csv', iterator=True, chunksize=1000)
df = pd.concat([chunk[chunk['ID'] == 1234567] for chunk in iter_csv])
However, the number of results seems much less than what it should be. Is there any advice from anyone?
Thanks.
update on 2019/02/19
I have managed to load the csv via this. However, it is noticed that the unmber of results (shown in df.shape) vary with different chunksize.....
iter_csv = pd.read_csv('data.csv', iterator=True, chunksize=1000)
df = pd.concat([chunk[chunk['ID'] == 1234567] for chunk in iter_csv])
df.shape