-1

I've just begun experimenting Python for doing my thesis. I'd like to import a huge csv fileso I decided to import it by smaller parts, while skimming the significative data. The csv to import is 20GB and here is the function I came up with:

"""importing 10e6 rows at the time...with 10e7 rows python crashes"""
import pandas as pd
import numpy as np

def screma_dati(file):
    i=1000000
    print("\n...begin skimming...")

    #first reading
    data_values=pd.read_csv(file,nrows=i)
    print("\n\t Dataset:\t"+file)
    print("\n\t part n: 1")

    #further readings
    length_rows=i
    j=i
    while length_rows = i
        except KeyboardInterrupt:
        data=pd.read_csv(file,header=None,nrows=i,skiprow=j)
        shp=np.shape(data)
        length_rows=shp[0]
        idx=data.loc[data["mbaddr"].isin(np_cod)]#np_cod are reference code
        data_values.append(idx)
        j+=i
        print("\n\t part n: " +str(j/i))

    print("\n...end skimming...")

    return data_values

It gives me syntax error while compiling and even if it is probably a banal error I can't figure out how to solve it. I started with python just in these days so probably the function has few more errors..

PS[offtopic]: would this be a good way to import such a large dataset?

cektek1
  • 61
  • 1
  • 7
  • 2
    Missing colon on line with while loop. Need to have try before except. Need to indent code after except. – Ron May 16 '18 at 14:42
  • 2
    offtopic: you should try parsing your file line by line or chunk by chunk. Here's the general idea: https://stackoverflow.com/questions/17444679/reading-a-huge-csv-file – sshashank124 May 16 '18 at 14:42
  • I added colon sign after while condition and still doesn't debug. – cektek1 May 16 '18 at 14:47
  • Can you double check for the correct whitespace (tabs vs spaces). You can use a view whitespace feature in a text editor or winword – Gabriel Fair May 16 '18 at 14:53

2 Answers2

1
while length_rows == i:
    pass

Also your except lacks "try"

 try: 
    something
 except Exception:
    something
0

Your syntax error is after your while condition It should look like below. And you are using the except wrongly. You should be probably using try instead to except.

while length_rows = i:
    try:
        data=pd.read_csv(file,header=None,nrows=i,skiprow=j)
        shp=np.shape(data)
        length_rows=shp[0]
        idx=data.loc[data["mbaddr"].isin(np_cod)]#np_cod are reference code
        data_values.append(idx)
        j+=i
        print("\n\t part n: " +str(j/i))

    except KeyboardInterupt:
        raise   # or just simple pass 

print("\n...end skimming...")

return data_values

Hope this solves your problem...

Michael Yadidya
  • 1,397
  • 1
  • 9
  • 15