0

I have 30 csv files. I want to give it as input in for loop, in pandas?

Each file has names such as fileaa, fileab,fileac,filead,....

And i would like to receive one output.

Usually i use read_csv but due to memory error, 'read_csv' doesn't work.

f = "./file.csv"
df = pd.read_csv(f, sep="/", header=0, dtype=str)

P.S only the first file has the column title but number of columns are same

  • https://stackoverflow.com/questions/42931068/pandas-read-csv-memoryerror, is this what you are looking for? – Vinay Jul 06 '17 at 15:25
  • Details please: how many columns does each file have? how do you want the output formatted (perhaps with a small example)? what is the backtrace for the error you get? is `file.csv` one of the thirty files? – NickD Jul 06 '17 at 15:25
  • @Vinay exactly ! thank you ! i am trying the same code to see if it works ! i will let you know if "low_memory=True" flag works –  Jul 06 '17 at 15:27
  • @Nick each file has around 50 rows and numerous lines (5gb data) –  Jul 06 '17 at 15:28
  • @Vinay doesnt work! :( memory error –  Jul 06 '17 at 15:29
  • try blaze: http://blaze.pydata.org ... Sounds like you either need to get a beefier machine (pandas is in-memory only), use something like blaze, or move straight to spark (which you can run locally if you want) and process it in smaller pieces. for blaze, you will probably need to recode the csv files into a database (sqlite comes built-in) or hdf5 (which pandas can handle directly). – Corley Brigman Jul 06 '17 at 15:32
  • You mean 50 columns? If each file is 5GB, then you'll need more than 150GB of memory to hold them all. If you don't have that much memory, then the only thing you can do is process them in small enough pieces (say 1000 rows at a time or 1000000 rows at a time: whatever your machine can do) to fit in memory and use the file system to hold intermediate results, then concatenate the results into a single file. Why do you want them in a single file? That's going to give you even more trouble in the future. – NickD Jul 06 '17 at 15:34
  • exactly !! thats my question ! so i have split this one huge file into 30 small files ! now i want to load them in loop and get the result in one single file –  Jul 06 '17 at 15:43
  • @WaltReed Yes ! lot of operations ! –  Jul 07 '17 at 08:03
  • @walt reed nop! –  Jul 07 '17 at 13:51
  • 1
    This will allow you to read the csv file in chunks. Try playing around with the parameter. Maybe `chunksize=500000`, and if that's too much, bump it down. – Walt Reed Jul 07 '17 at 13:55
  • okay! thanks ! i will try that :) –  Jul 07 '17 at 13:56

0 Answers0