0

I am trying to get the latest data for every customer regardless of other attributes in the dataframe.

My dataframe looks like this

enter image description here

My output should look like this

My output should look like this

I have tried 'df.iloc[df.groupby('customer')['date'].idxmax()]' but I am getting ValueError.

"ValueError Traceback (most recent call last) in ----> 1 df = df.iloc[df.groupby('cutomer')['date'].idxmax()]

~\Anaconda3\envs\myenv\lib\site-packages\pandas\core\groupby\groupby.py in wrapper(*args, **kwargs) 653 if self.obj.ndim == 1: 654 # this can be called recursively, so need to raise ValueError --> 655 raise ValueError 656 657 # GH#3688 try to operate item-by-item

ValueError: "

sagar_c_k
  • 83
  • 1
  • 8
  • 1
    Welcome to SO! What have you tried so far? – DaveIdito Nov 17 '20 at 20:30
  • 1
    Can you please remove the link to the image and instead post that as clear text. It would be best if you can post the information as dataframe so its easy for everyone to extract. – Joe Ferndz Nov 17 '20 at 20:35
  • Looks like you want to do a max of date based on other attributes. Did you try any code yet? – Joe Ferndz Nov 17 '20 at 20:40
  • 2
    Does this answer your question? [How to get value of a column based on the maximum of another column in case of DataFrame.groupby](https://stackoverflow.com/questions/49263437/how-to-get-value-of-a-column-based-on-the-maximum-of-another-column-in-case-of-d) – Joe Ferndz Nov 17 '20 at 20:41
  • 1
    df.loc[df.groupby(['Customer','attr1','attr2'])['date'].idxmax()] – Joe Ferndz Nov 17 '20 at 20:43
  • 'df.loc[df.groupby(['Customer','attr1','attr2'])['date'].idxmax()]' won't work as it is dependent on other attributes in the dataframe. My output should be dependent only on the customer and it should return all the rows which are recent to that particular customer. – sagar_c_k Nov 17 '20 at 20:52
  • 'df.loc[df.groupby(['memberid'])['collected_on'].idxmax()]' should ideally work but I am getting ValueError. Then I changed my "date" columns which was a datetime object and reformatted it to just have the date field using 'df['date'] = pd.to_datetime(df['date'].apply(lambda x: x.date()))' after which I don't get the ValueError anymore but I get a wrong output – sagar_c_k Nov 17 '20 at 20:59
  • Thank you all for the help :) – sagar_c_k Nov 17 '20 at 23:07

1 Answers1

0

I think it's really the same as this one: similar problem
In this case the code would look like this:

df['date'] = pd.to_datetime(df.date)
idx = df.groupby('Customer')['date'].transform(max) == df['date']
df[idx] 
urirot
  • 279
  • 1
  • 10