loading 12GB csv into python and converted it into dataframe

Question

I want to load a 12GB csv file into python and then do analysis. I attempted to use this method

file_input_to_system = pd.read_csv(usrinput)

, but it failed because the method consumed all my RAM.

My goal now is to read the file from hard disk but not read it from RAM. I googled it and found out this sample

f = open("file_path","r")
for row in csv.reader(f):
    df = pd.DataFrame(row)
    print(df)
f.close()

But I am not sure how to modify it such that it can read a csv and parse it into dataframe.

When I try this one, it can read file and not consume all my memory. However, when I parse it to dataframe, all my memory is consumed.

chunksize = 100
df = pd.read_csv("C:/Users/user/Documents/GitHub/MyfirstRep/export_lage.csv",iterator=True,chunksize=chunksize)
df = pd.concat(df, ignore_index=True)
print(df)

If you can't read the _entire_ dataset in at once, you either need to (a) sample it down, or (b) read in only a handful of columns. P.S. calling `pd.DataFrame` on each row would be crazy slow. — Michael Griffiths, Oct 19 '16 at 02:54
`pd.read_csv` supports `chunksize` param which will load `N` rows that you pass, so you can read the file in chunks but you will not be able to store the entire df in memory. So you can either store the data in something like HDF5 file or just operate in chunks — EdChum, Oct 19 '16 at 07:54
Because what i want to do is to load a huge dataset into pd.DataFrame and perform statistical analysis on it. @EdChum, what you mean is that I couldn't do it if i insist on putting all data into pd.DataFrame at once. It seems that i need to mathematically break my calculation and rejoin them later. — Ben Chan, Oct 21 '16 at 09:10
Maybe you have the wrong tool in mind. Look at options in the Hadoop ecosystem and / or using Spark / PySpark — openwonk, Oct 21 '16 at 18:24
[Large Data Workflows Using Pandas](https://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas/14268804#14268804) - from pandas developer Jeff Reback — Brad Solomon, Sep 19 '17 at 13:20

loading 12GB csv into python and converted it into dataframe

0 Answers0