1

So this is my first time working with large-ish data sets (~5gigs) and I was concerned about loading the data into Pandas. I only have ~4 gigs of free ram on my laptop and am concerned that if I read the entire file into pandas, my laptop would crash.

If I load a 5 gig csv file into pandas, will it take up 5 gigs of memory? Will it take more?

Nakul Upadhya
  • 494
  • 4
  • 16
  • 4
    https://pandas.pydata.org/pandas-docs/stable/user_guide/scale.html – Daniel Jun 15 '22 at 14:07
  • 1
    The CSV file may be 5 gigs, but once loaded will probably take less than 5 gigs. It also doesn't read all 5 gigs at once, you can set the chunk size. For larger than memory datasets you should be using dask. Your computer wont crash, but the python program will be very slow. – Tom McLean Jun 15 '22 at 14:08
  • 1
    [Download more ram](https://www.google.com/search?q=download+more+ram&sxsrf=ALiCzsbOGD-WmNUxtkt-QylEXv7nJrkVmw%3A1655302254278&source=hp&ei=buipYufwDc-_tQaBkouIBg&iflsig=AJiK0e8AAAAAYqn2fizocqquvb4ShVST5AvC2E5bGjfV&ved=0ahUKEwjn3--20a_4AhXPX80KHQHJAmEQ4dUDCAg&uact=5&oq=download+more+ram&gs_lcp=Cgdnd3Mtd2l6EAMyBQgAEIAEMgUIABCABDIFCAAQgAQyBAgAEAoyBAgAEAoyBQgAEIAEMgUIABCABDIFCAAQgAQyBQgAEIAEMgUIABCABFAAWABgxwJoAHAAeACAAVaIAVaSAQExmAEAoAECoAEB&sclient=gws-wiz)? Joking aside, even if you can load that csv into a dataframe, it would be very limited to what you can do with it.. – Quang Hoang Jun 15 '22 at 14:11
  • 1
    Note that if the csv is plaintext, it will be slightly smaller in memory, but if it’s compressed, it will be larger. So watch out for binary or .csv.gz files etc. maybe try opening it in a text editor to see if you can read it. And yeah I’d recommend that section of the pandas docs and dask for sure. – Michael Delgado Jun 15 '22 at 14:40
  • 1
    This question is relevant and has good tips in the answers and comments: https://stackoverflow.com/q/69153017/3888719 – Michael Delgado Jun 15 '22 at 15:14

0 Answers0