I have a csv file that has 100 million rows and using a pc with 14GB of RAM. I have cut it into two parts of 50 million rows each. I have been waiting for two days just for the script to execute this code:
df['Column1']=df['Column1'].apply('{:0>7}'.format)
for index in df.index:
if df.loc[index, 'Column2'] ==0.0 and df.loc[index,'Column3']==0:
df.loc[index,'Column4'] = df.loc[index,'Column1'][:6]
else:
'F'
If there was a method to simplify that code, would that change the time to execute that code?
. Column1 Column2 Column 3 Column4
0 5487964 1.0 2.0 F
1 5587694 0.0 0 558769
2 7934852 1.0 0 F
3 5487964 0.0 2.0 F
4 1111111 0.0 0 111111
5 5487964 1.0 2.0 F