0

Good morning,

I'm trying to update a DataFrame based on the contents of two columns and am running into issues.

Specifically, I have a column called IP, another called VISITTIME. I've added two columns called OLDEST and NEWEST which need to contain the min and max VISITTIME for the IP of that row.

Using:

df2 = pd.merge(df.groupby('IP')['VISITTIME'].min().to_frame(), 
               df.groupby('IP')['VISITTIME'].max().to_frame(), on="IP") 

I can get the min and max times for each IP in the table. I can then iterate over that but I don't know how to update the original DataFrame.

Essentially what I'm asking is how do I do the following in pandas:

UPDATE df SET df.OLDEST = df2.OLDEST, df.NEWEST = df2.NEWEST WHERE df.IP=df2.IP; 

I feel like this should be easy and I'm ashamed that it isn't.

Thank you

user3471881
  • 2,614
  • 3
  • 18
  • 34
DClarke
  • 1
  • 2

1 Answers1

0

Depending on your version of pandas (I know in version > 0.22.0), there is a method called Dataframe.update.

That should provide some examples, but a few warnings:

  • You need to make sure that both dataframes are using IP as an index, or this method will not work.
  • The column (Series) names should be the same, so it will not require any location or column referencing.
  • You may choose to only overwrite NA values, but not applicable for what you are describing here.

DataFrame.update(other, join='left', overwrite=True, filter_func=None, raise_conflict=False)

Modify in place using non-NA values from another DataFrame.

Aligns on indices. There is no return value.

rogersdevop
  • 66
  • 1
  • 3