I have the following dataframe
location tps_inter sess_glob
0 loc1 0 0
1 loc1 79 0
2 loc1 3 0
3 loc1 17 0
4 loc2 0 0
5 loc2 46 0
6 loc3 0 0
I would like to groupby by location and add 1 for the first row of each group.
location tps_inter sess_glob
0 loc1 0 1
1 loc1 79 0
2 loc1 3 0
3 loc1 17 0
4 loc2 0 1
5 loc2 46 0
6 loc3 0 1
Then for each group, I want to add a index depending on the value of tps_inter. If tps_inter is less than 10, sess_glob should be the same value as before, if it's greater than 10, same value + 1.
The desired result is
location tps_inter sess_glob
0 loc1 0 1
1 loc1 79 2
2 loc1 3 2
3 loc1 17 3
4 loc2 0 1
5 loc2 46 2
6 loc3 0 1
This code is working but it become very slow when the number of rows increase
df1 = df.copy()
df1 = df1.iloc[0:0]
gdf = df.groupby('location')
i = 1
for table, group in gdf:
for row, data in group.iterrows():
if data["tps_inter"] > 10 :
i = i + 1
data['sess_glob'] = i
df1 = pd.concat([df1, data.to_frame().T])
i = 1
I think there is a better way to do it without the concatenation line but I can't find it. The main problem I have is to get the result in Dataframe and not in series.
( I used the following question to write my code How to loop over grouped Pandas dataframe? )