0

I have a large panel data in a pandas dataframe. The example data can be found here:

import pandas as pd 

df = pd.read_csv('example_data.csv')

df.head()

ID  Year    y   DOB Year_of_death   event
223725  1991    6   1975.0  2021    No
223725  1992    6   1975.0  2021    No
223725  1993    6   1975.0  2021    No
223725  1994    6   1975.0  2021    No
223725  1995    6   1975.0  2021    No

I want to change the values in the column event so that if the Year value corresponds to the Year_of_death value then the observation in event for that specific row or ID changes to Yes, otherwise it remains as No.

For example, ID 68084329 died in 2012 but has the value Yes in every observation in the column event. I want to change it so that only the row with Year 2012 for this ID has Yes in event. The other event values should remain as No.

df.loc[df['ID'] == '68084329']

ID         Year    y    DOB  Year_of_death  event
68084329    1991    6   1942.0  2012    Yes
68084329    1992    5   1942.0  2012    Yes
68084329    1993    5   1942.0  2012    Yes
68084329    1994    6   1942.0  2012    Yes
68084329    1995    6   1942.0  2012    Yes
68084329    1996    5   1942.0  2012    Yes
68084329    1997    6   1942.0  2012    Yes
68084329    1998    5   1942.0  2012    Yes
68084329    1999    6   1942.0  2012    Yes
68084329    2000    6   1942.0  2012    Yes
68084329    2001    6   1942.0  2012    Yes
68084329    2002    5   1942.0  2012    Yes
68084329    2003    6   1942.0  2012    Yes
68084329    2004    5   1942.0  2012    Yes
68084329    2005    5   1942.0  2012    Yes
68084329    2006    6   1942.0  2012    Yes
68084329    2007    6   1942.0  2012    Yes
68084329    2008    6   1942.0  2012    Yes
68084329    2010    5   1942.0  2012    Yes
68084329    2011    5   1942.0  2012    Yes
68084329    2012    0   1942.0  2012    Yes

How do I make these changes for a large DataFrame with many IDs in accordance with the above conditions?

MI MA
  • 171
  • 5
  • 1
    Not sure where the ID matters here, so I think `df['event'] = np.where(df['Year'].eq(df['Year_of_death']), 'Yes','No')` would do it – Ben.T Sep 02 '20 at 19:54
  • Works very nicely here, thanks a lot for the help – MI MA Sep 02 '20 at 19:58
  • 2
    Does this answer your question? [Compare two columns using pandas](https://stackoverflow.com/questions/27474921/compare-two-columns-using-pandas) – RichieV Sep 02 '20 at 19:59
  • 2
    @Ben.T I feel that is the single best feature in SO, have others help you with the right search terms – RichieV Sep 02 '20 at 20:05

1 Answers1

1
df.loc[df[Year'] == df['Year of Death'], 'Event'] = 'Yes'

That worked in a similar piece of code I was writing.

Z. Fralish
  • 438
  • 4
  • 9