0

I am using an i7 laptop with 8gb RAM with no viruses, I am trying to read a 2.2 gb CSV file into python dataframe using the following code:

  tp = pd.read_csv(r'..\train_ver2\train_ver2.csv',iterator =True, chunksize=10000)
  df =pd.concat(tp,ignore_index =True)
  print(df)

but mu system just hangs everytime I run the command. Is their a problem with my code or is my system not efficient enough? Any thoughts or suggestions are welcome. Thankyou!

Community
  • 1
  • 1
Uasthana
  • 1,645
  • 5
  • 16
  • 24
  • Googling for "python large csv site:stackoverflow.com" will show you a lot of existing questions to help fix this. You can certainly make your code handle the CSV file more intelligently. – skrrgwasme Oct 28 '16 at 18:28
  • You may also get better results from just reducing your `chunksize` parameter. – skrrgwasme Oct 28 '16 at 18:30
  • @skrrgwasme. Thanks for the tip, I was finally able to run the query above looks like the chunksize parameter is not working. The query returns all 11 million records instead of the 10000 I mentioned for some reason. – Uasthana Oct 28 '16 at 18:38
  • @skrrgwasme oh I just figured what we use chunksize for.....comment above look stupid now – Uasthana Oct 28 '16 at 18:51

0 Answers0