I have df with 30 millions rows with form:
0;401
0;924
0;925
1;145
1;414
1;673
2;144
2;145
2;153
And i need to extract rows where the value in the first column is repeated multiple times (e.g. 100). I'm try rude method:
df1 = pd.DataFrame()
state_last = None
for index,row in df.iterrows():
if row.loc['S1'] != state_last: #to skip itterations where im already estimate part of df
state_last = row.loc['S1']
temp = df.loc[df['S1']==row['S1']]
if temp.shape[0] > 100:
df1=df1.append(temp)
also i try
for i in range(19709): #max number in df
temp = df.loc[df['S1']==i]
if temp.shape[0] > 100:
df1=df1.append(temp)
But these methods are too ineffective. Can this be done more quickly? Thanks in advance