python: how to read and process a 18GB csv file?

Question

I have a 18GB csv file from measurement and want to do some calculation based on it. I tried to do it with pandas but seems like it takes forever just to read this file.

Following codes are what I did:

df=pd.read_csv('/Users/gaoyingqiang/Desktop/D989_Leistung.csv',usecols=[1,2],sep=';',encoding='gbk',iterator=True,chunksize=1000000)
df=pd.concat(df,ignore_index=True)

U1=df['Kanal 1-1 [V]']
I1=df['Kanal 1-2 [V]']

c=[]
for num in range(0,16333660,333340):
    lu=sum(U1[num:num+333340]*U1[num:num+333340])/333340
    li=sum(I1[num:num+333340]*I1[num:num+333340])/333340
    lui=sum(I1[num:num+333340]*U1[num:num+333340])/333340
    c.append(180*mt.acos(2*lui/mt.sqrt(4*lu*li))/np.pi)
    lu=0
    li=0
    lui=0

phase=pd.DataFrame(c)
phase.to_excel('/Users/gaoyingqiang/Desktop/Phaseverschiebung_1.xlsx',sheet_name='Sheet1')

Is there anyway to accelerate the process?

Is it necessary to produce one single, massive excel file? Are you sure excel can read this? — mdurant, Aug 01 '17 at 13:37

Stael · Answer 1 · 2017-08-01T10:51:09.323

you're reading it in in chunks of 1,000,000, then concating it into one huge df, then processing it. Would be quicker to read in a chunk, process it (write it?) then read the next chunk?

In answer to your comment, when you

df_chunks = pd.read_csv(..... chunksize=1000000)

you get a pandas.io object (or something like that)

I'm pretty sure you can do this:

for chunk in df_chunks:
    # do something, eg..
    U1=df['Kanal 1-1 [V]']
    I1=df['Kanal 1-2 [V]']

    c=[]
    for num in range(0,16333660,333340):
        lu=sum(U1[num:num+333340]*U1[num:num+333340])/333340
        li=sum(I1[num:num+333340]*I1[num:num+333340])/333340
        lui=sum(I1[num:num+333340]*U1[num:num+333340])/333340
        c.append(180*mt.acos(2*lui/mt.sqrt(4*lu*li))/np.pi)
        lu=0
        li=0
        lui=0

    phase=pd.DataFrame(c)
    # append phase to a csv file (i'd have to google how to do that but I'm sure you can)

if you search around SO there are a few topics on this, eg: How to read a 6 GB csv file with pandas

Could you pls show me how to read it in a chunk and process it? — Yingqiang Gao, Aug 01 '17 at 10:42
I read the How to read a 6 GB csv file with pandas and I don't know what does the process(chunk) mean, it seems like that the chunk itself ist not a pandas DataFrame right? — Yingqiang Gao, Aug 01 '17 at 11:27

python: how to read and process a 18GB csv file?

1 Answers1