I now have a big csv file (18GB) and I want to read it in chunks and then process it.
I have two problems here:
How can I check whether the last chunk contains NaN since the total length of csv file cannot be divided into integer by chunksize
How can I write the new data into this existing xlsx file without overwriting the old data.
Here's the code:
chunkSize=6666800
periode=333340
for chunk in pd.read_csv('/Users/gaoyingqiang/Desktop/D970_Leistung.csv',delimiter=';',encoding='gbk',iterator=True,chunksize=chunkSize):
U1=chunk['Kanal 1-1 [V]']
I1=chunk['Kanal 1-2 [V]']
c=[]
if chunk.isnull.values.any():
break #here I tried to check the last chunk whether it contains NaN or 0 by check the last elements in U1 to avoid the ZeroDivisionError. But the error was like AttributeError: 'function' object has no attribute 'values'
for num in range(0,chunkSize, periode):
lu = sum(U1[num:num + periode] * U1[num:num + periode]) / periode
li = sum(I1[num:num + periode] * I1[num:num + periode]) / periode
lui = sum(I1[num:num + periode] * U1[num:num + periode]) / periode
c.append(180 * mt.acos(2 * lui / mt.sqrt(4 * lu * li)) / np.pi)
lu = 0
li = 0
lui = 0
book=load_workbook('/Users/gaoyingqiang/Desktop/Phaseverschiebung_1.xlsx')
writer=pd.ExcelWriter('/Users/gaoyingqiang/Desktop/Phaseverschiebung_1.xlsx',engine='openpyxl')
writer.book=book
writer.sheets=dict((ws.title,ws) for ws in book.worksheets)
phase = pd.DataFrame(c)
phase.to_excel(writer,'Main')
writer.save() #I found it keeps overwriting.
And here's the structure of the data:
And there was an error byif chunk.isnull.values.any()
If I don't do this NaN check, and then
So where goes wrong?