0

I have a dataframe like this:

account    date
  A        0812
  A        0812
  A        0812
  A        0823
  A        0823
  B        0723
  B        0730
  B        0730
  B        0801
  B        0801
  B        0801

I want to get the 'date' value for the first time the value changes per account. So the output I'm looking for is this:

account   date
  A       0823
  B       0730

I have tried to do a dense rank groupby function and filter by rank equaling 1.

df.groupby('account')['date'].rank(method='dense') but the output keeps the same rank for the same value, which does not work. 'first' and 'last' ranks don't seem to be working either.

Kdawg
  • 23
  • 3

1 Answers1

2

I believe you need DataFrame.drop_duplicates first and then get second value per group, by GroupBy.cumcount:

df1 = df.drop_duplicates(['account','date'])

df1 = df1[df1.groupby('account').cumcount().eq(1)]
print (df1)
  account  date
3       A   823
6       B   730

Or by GroupBy.nth:

df1 = df.drop_duplicates(['account','date'])

df1 = df1.groupby('account', as_index=False).nth(1)
print (df1)
  account  date
3       A   823
6       B   730
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252