4

I'm trying to read data from .csv file in Jupyter Notebook (Python)

.csv file is 8.5G, 70 million rows, and 30 columns

When I try to read .csv, i get errors.

Below are my codes

import pandas as pd

log = pd.read_csv('log_20100424.csv', engine = 'python')

I also tried using pyarrow, but it doesn't worked.

import pandas as pd
from pyarrow import csv`

log = csv.read('log_20100424.csv').to_pandas()

My Question is :

How to read a huge(8.5G) .csv file in Jupyter Notebook

Is there any other way to read a huge .csv file ?

My Laptop has 8gb RAM, running 64bit Windows 10, and i5-8265U 1.6Ghz.

jwowowo
  • 41
  • 1
  • 2
  • 1
    Check out `dask`. It's a library that allows you to work with big data on small computers by lazily evaluating work and only loading what you can deal with on your machine. There's no other way to load an 8.5 GB csv with < 8 GB of RAM other than to chunk up the CSV. – alkasm Apr 23 '20 at 17:39
  • Does this answer your question? [python - Using pandas structures with large csv(iterate and chunksize)](https://stackoverflow.com/questions/33642951/python-using-pandas-structures-with-large-csviterate-and-chunksize) – artemis Apr 23 '20 at 17:41
  • 1
    Can you post the errors you get? – Uwe L. Korn Apr 24 '20 at 07:21
  • pyarrow would be helpful if you eliminate pandas dependency, pandas is causing your computer to crash.Even you are able to read it, you can't query on data with current memory your computer had. as an alternative longterm solution; you should use a backend database or apache spark to run it on your computer otherwise you would need better hardware or a temporary cloud service – oetzi Apr 30 '20 at 20:22

1 Answers1

6

Even if Pandas can handle huge data, Jupyter Notebook cannot. To read a huge CSV file, you need to work in chunks. I faced similar situation where the Jupyter Notebook kernel would die and I had to start over again. Try this - Pandas Error Jupyter Notebook

Venkataraman R
  • 12,181
  • 2
  • 31
  • 58
Varun Nagrare
  • 153
  • 1
  • 11
  • 1
    It is good to post code as a text, not as a image. Even if you post image, code would be nice to have. – Magiczne Oct 07 '20 at 06:32
  • @Magiczne Yeah thanks for the suggestion. This is my first ever post so I'm a noob here. I wanted to show the output with it to be more detailed as I myself face problems just by blindly typing the code that I come across when solving my issues. So if I have an idea of how the output would be shown, it would help me understand better. That's why I posted an image. – Varun Nagrare Oct 07 '20 at 06:48