How to iterate through each row of a groupby object created by groupby()?

Question

I'm working with a large dataset that includes all police stops in my city since 2014. The dataset has millions of rows, but sometimes there are multiple rows for a single stop (so, if the police stopped a group of 4 people, it is included in the database as 4 separate rows even though it's all the same stop). I'm looking to create a new column in the dataset orderInStop, which is a count of how many people were stopped in sequential order. The first person caught up in the stop would have a value of 1, the second person a value of 2, and so on.

To do so, I have used the groupby() function to group all rows that match on time & location, which is the indication that the rows are all part of the same stop. I can manage to create a new column that includes the TOTAL count of the number of people in the stop (so, if there were 4 rows with the same time & location, all four rows have a value of 4 for the new orderInStop variable. But I need the first row in the group to have a value of 1, the second a value of 2, the third 3, and the fourth 4.

Below is my code attempt at iterating through each group I've created to sequentially count each row within each group, but the code doesn't quite work (it populates the entire column rather than each row within the groups). Any help to tweak this code would be much appreciated! Note: I also tried using logical operators in a for loop, to essentially ask IF the time & location column values match for the current and previous rows, but ran into too many problems with 'the truth values of a Series is ambiguous' errors, so instead I'm trying to use groupby().

Attempt that creates a total count rather than sequential count:

df['order2'] = df.groupby(by=["Date_Time_Occur", "Location"])['orderInStop'].transform('count')

Attempt that fails, to iterate through each row in each group:

df['order3'] = 1
grp = df.groupby(by=["Date_Time_Occur", "Location"])

for name, groups in grp:
  count = 1
  for row in groups:
    df['order3'] = count
    count = count + 1

Sounds like you need a cumulative sum, maybe `.transform(lambda col: col.eq(col).cumsum())` instead of `.transform('count')`? In general, however, you should avoid lengthy descriptions and instead post a small example dataframe that illustrates the problem and your expected output. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — fsimonjetz, Sep 15 '22 at 20:25

score 0 · Answer 1 · answered Sep 15 '22 at 20:43

In your example for row in groups iterates over the column names, since groups is a DataFrame.

To iterate over each row you could do

df['order3'] = 1
grp = df.groupby(by=["Date_Time_Occur", "Location"])

for name, groups in grp:
  count = 1
  for i, row in groups.iterrows():  # i will be index, row a pandas Series
    df['order3'] = count
    count = count + 1

Note that your solution relies on pandas groupby to preserve row order. This should be the case, see this question, but there is very likely a shorter & safer solution (see fsimonjetz comment for a starting point).

How to iterate through each row of a groupby object created by groupby()?

1 Answers1