0

I have columns called country and city. Grouping by countries I want to calculate how many times a city appears in succession.

Example input:

country     city
France      Paris
France      Paris
France      Nice
France      Paris
France      Paris
Germany     Munich
Germany     Berlin
Germany     Berlin
Switzerland Geneva
Switzerland Geneva
Switzerland Geneva

Example output:

country     city    repetition_in_succession
France      Paris   1
France      Paris   2
France      Nice    1
France      Paris   1
France      Paris   2
Germany     Munich  1
Germany     Berlin  1
Germany     Berlin  2
Switzerland Geneva  1
Switzerland Geneva  2
Switzerland Geneva  3

The 5th row here has 2 because we see Paris twice in a row. Note that we cannot assume that cities are unique to countries so we need to group on the country.

Also note that this question is NOT answered by How can I do a sequential count based on column value and timestamp in pandas? because I need the counter to restart when the city changes and changes back. This other similar question doesn't have that requirement.

Daniel Wyatt
  • 960
  • 1
  • 10
  • 29
  • 2
    This is a duplicate of the `cumcount` logic, but you need to group on the successive rows: `df.groupby(df.ne(df.shift()).any(axis=1).cumsum()).cumcount().add(1)` – mozway May 26 '23 at 13:27

0 Answers0