1

I am trying to read 4684 csv files from an folder with each file consisting of 2000 rows and 102 columns and file size is 418 kB. I am reading and appending them one by one using below code.

for file in allFiles:
    df = pd.read_csv(file,index_col=None, header = None)
    df2 = df2.append(df)

This is taking 4 to 5 hours to read all the 4684 file and append in on dataframe. Is there any possibility to make this process complete quickly. I am using i7 with 32GB ram.

Thanks

VMan
  • 45
  • 1
  • 5
  • Try not to append the new dataframe inside a loop. It's more efficient to add it to a list and use pd.concat to join them together after the loop. You could refer to [this](https://stackoverflow.com/questions/36489576/why-does-concatenation-of-dataframes-get-exponentially-slower) – Louis Jan 24 '19 at 20:08
  • Hi Louis, Thanks for the reponse. I got it – VMan Jan 24 '19 at 20:10
  • 1
    Link here: https://stackoverflow.com/questions/36489576/why-does-concatenation-of-dataframes-get-exponentially-slower – Louis Jan 24 '19 at 20:11
  • The link above would explain the issue with this well. Personally, I would try to find a way to do this in maybe bash or something. `cat *.csv > consolidated.csv` should be pretty performant, and should reduce your overhead to just one `read_csv`. – WGS Jan 24 '19 at 20:18

0 Answers0