1

Given

df = pd.DataFrame({"LOCATION":["USA","USA","USA","USA","USA","USA","USA","JAPAN","JAPAN"],"CAR":["BMW","BMW","BMW","BMW","BMW","TOYOTA","FORD","BMW","FORD"],"SALE_DATE":[2017,2017,2017,2018,2018,2018,2019,2019,2019]})

Will result in:

      CAR LOCATION  SALE_DATE
0     BMW      USA       2017
1     BMW      USA       2017
2     BMW      USA       2017
3     BMW      USA       2018
4     BMW      USA       2018
5  TOYOTA      USA       2018
6    FORD      USA       2019
7     BMW    JAPAN       2019
8    FORD    JAPAN       2019

What would be a pythonian way to add a 'position' based on the year. In other words, if the year is 'new' start from 0 until you have a new one.

To visualize it, I'm manually adding column["position"] where the end result should look like this:

      CAR LOCATION  SALE_DATE  POSITION
0     BMW      USA       2017       0
1     BMW      USA       2017       1
2     BMW      USA       2017       2
3     BMW      USA       2018       0
4     BMW      USA       2018       1
5  TOYOTA      USA       2018       2
6    FORD      USA       2019       0 
7     BMW    JAPAN       2019       1
8    FORD    JAPAN       2019       2
adhg
  • 10,437
  • 12
  • 58
  • 94

2 Answers2

2

GroupBy the SALE_DATE and then use cumcount:

df['POSITION'] = df.groupby('SALE_DATE').cumcount()

    LOCATION   CAR    SALE_DATE  POSITION
0      USA     BMW       2017         0
1      USA     BMW       2017         1
2      USA     BMW       2017         2
3      USA     BMW       2018         0
4      USA     BMW       2018         1
5      USA  TOYOTA       2018         2
6      USA    FORD       2019         0
7    JAPAN     BMW       2019         1
8    JAPAN    FORD       2019         2
yatu
  • 86,083
  • 12
  • 84
  • 139
2

Use cumcount

df['POSITION'] = df.groupby('SALE_DATE').cumcount()
rafaelc
  • 57,686
  • 15
  • 58
  • 82