2

I now have a big csv file (18GB) and I want to read it in chunks and then process it.

I have two problems here:

  1. How can I check whether the last chunk contains NaN since the total length of csv file cannot be divided into integer by chunksize

  2. How can I write the new data into this existing xlsx file without overwriting the old data.

Here's the code:

chunkSize=6666800
periode=333340
for chunk in pd.read_csv('/Users/gaoyingqiang/Desktop/D970_Leistung.csv',delimiter=';',encoding='gbk',iterator=True,chunksize=chunkSize):
    U1=chunk['Kanal 1-1 [V]']
    I1=chunk['Kanal 1-2 [V]']
    c=[]
    if chunk.isnull.values.any():
        break #here I tried to check the last chunk whether it contains NaN or 0 by check the last elements in U1 to avoid the ZeroDivisionError. But the error was like AttributeError: 'function' object has no attribute 'values'
    for num in range(0,chunkSize, periode):
        lu = sum(U1[num:num + periode] * U1[num:num + periode]) / periode
        li = sum(I1[num:num + periode] * I1[num:num + periode]) / periode
        lui = sum(I1[num:num + periode] * U1[num:num + periode]) / periode
        c.append(180 * mt.acos(2 * lui / mt.sqrt(4 * lu * li)) / np.pi)
        lu = 0
        li = 0
        lui = 0

book=load_workbook('/Users/gaoyingqiang/Desktop/Phaseverschiebung_1.xlsx')
writer=pd.ExcelWriter('/Users/gaoyingqiang/Desktop/Phaseverschiebung_1.xlsx',engine='openpyxl')
writer.book=book
writer.sheets=dict((ws.title,ws) for ws in book.worksheets)

phase = pd.DataFrame(c)
phase.to_excel(writer,'Main')
writer.save() #I found it keeps overwriting.

And here's the structure of the data: enter image description here

And there was an error byif chunk.isnull.values.any()

enter image description here

If I don't do this NaN check, and then enter image description here

So where goes wrong?

Yingqiang Gao
  • 939
  • 4
  • 16
  • 29
  • Possible duplicate of [How to write to an existing excel file without overwriting data (using pandas)?](https://stackoverflow.com/questions/20219254/how-to-write-to-an-existing-excel-file-without-overwriting-data-using-pandas) – DJK Aug 01 '17 at 20:04
  • checking for nans can be found [here](https://stackoverflow.com/questions/29530232/python-pandas-check-if-any-value-is-nan-in-dataframe) – DJK Aug 01 '17 at 20:06
  • I wrote if chunck.isnull.values.any() as it says AttributeError: 'function' object has no attribute 'values'. What's that? – Yingqiang Gao Aug 02 '17 at 06:46
  • can you provide a sample dataframe of a chunk with NaN's? – DJK Aug 02 '17 at 07:25
  • like in a chunk there are 10 rows and the first 8 rows contain data and the last two contain nothing. It like when you have a data for 10 rows and you read it every 3 rows, the fourth chunk will only contains one row and the other 3 rows are empty – Yingqiang Gao Aug 02 '17 at 07:43
  • Ok, but if you dont provide a sample, I cannot replicate the problem – DJK Aug 02 '17 at 07:45
  • yeah it's rather difficult to slice the sample since the data file was extremely big... – Yingqiang Gao Aug 02 '17 at 07:47
  • under `for chunk in pd.read_csv()` add `print(chunk.head(10))` then break loop and put the dataframe in your question – DJK Aug 02 '17 at 07:50
  • kindly see the new edition :) – Yingqiang Gao Aug 02 '17 at 07:57
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/150790/discussion-between-yingqiang-gao-and-djk47463). – Yingqiang Gao Aug 02 '17 at 08:03
  • you wrote `if chunk.isnull.values.any()` and it should be `if chunk.isnull().values.any()` – DJK Aug 02 '17 at 21:42

1 Answers1

0

If you want to write all chunks into same sheet then try this code:

define a variable rowLength, for 1st chunk rowLength should be zero, and increment the value by chunksize

rowLength = 0                        #for 1st chunk
rowLength = rowLength + chunksize

then write chunks to excel by specifying startrow

phase = pd.DataFrame(c)
phase.to_excel(writer,'Main', startrow=rowLength, index=False)

pandas to_excel documentation for your reference.

Geetha Ponnusamy
  • 497
  • 3
  • 15