adding new columns based on existing unique column combinations

Question

I have a data frame that looks like this:

fips	year	pollutant	nonattainment
72137	1992	Sulfur Dioxide (1971)
72137	1992	PM-2.5 (1997)	P
72137	1992	8-Hour Ozone (2015)	W
72137	1992	'Nitrogen Dioxide (1971)'
72137	1993	Sulfur Dioxide (1971)
72137	1993	PM-2.5 (1997)
72137	1993	8-Hour Ozone (2015)	W
72137	1993	'Nitrogen Dioxide (1971)'

FYI:

The nonattainment column has the value P or W
The pollutant contains value within this list ['PM-2.5 (1997)', 'PM-2.5 (2006)', 'PM-10 (1987)', 'PM-2.5 (2012)'].

Task:

I now want to add a new column called nonattainment_pm, which should contain the value 1; if for any unique fips-year combination,

Expected output:

i.e. the new data frame should look like this:

fips	year	pollutant	nonattainment	nonattainment_pm
72137	1992	Sulfur Dioxide (1971)		1
72137	1992	PM-2.5 (1997)	P	1
72137	1992	8-Hour Ozone (2015)	W	1
72137	1992	'Nitrogen Dioxide (1971)'		1
72137	1993	Sulfur Dioxide (1971)
72137	1993	PM-2.5 (1997)
72137	1993	8-Hour Ozone (2015)	W
72137	1993	'Nitrogen Dioxide (1971)'

What have you tried? Have you checked this post: [Creating a new column based on if-elif-else condition](https://stackoverflow.com/q/21702342/10452700)? Kindly, Please Google or check previous similar questions to avoid duplication. You didn't include what you have tried and which error you have faced so far!! — Mario, Jul 29 '23 at 15:32
Obviously I looked at that answer and also googled my problem, if I had found a suitable solution I wouldn't have asked here. — futur3boy, Jul 29 '23 at 16:07

score 0 · Accepted Answer · answered Jul 29 '23 at 15:36

here is one way :

pollutant_l = ['PM-2.5 (1997)', 'PM-2.5 (2006)', 'PM-10 (1987)', 'PM-2.5 (2012)']

df['nonattainment_pm'] = np.where((df['pollutant'].isin(pollutant_l)) & (df['nonattainment'].isin(['P', 'W'])), 1, 0)
df['nonattainment_pm'] = df.groupby(['fips', 'year'])['nonattainment_pm'].transform('max')

output:

    fips  year                  pollutant nonattainment  nonattainment_pm
0  72137  1992      Sulfur Dioxide (1971)           NaN                 1
1  72137  1992              PM-2.5 (1997)             P                 1
2  72137  1992        8-Hour Ozone (2015)             W                 1
3  72137  1992  'Nitrogen Dioxide (1971)'           NaN                 1
4  72137  1993      Sulfur Dioxide (1971)           NaN                 0
5  72137  1993              PM-2.5 (1997)           NaN                 0
6  72137  1993        8-Hour Ozone (2015)             W                 0
7  72137  1993  'Nitrogen Dioxide (1971)'           NaN                 0

Thanks a lot! This is exactly what I was looking for!! – futur3boy Jul 29 '23 at 16:06 — futur3boy, Jul 29 '23 at 16:06

adding new columns based on existing unique column combinations

1 Answers1