Python Dataframe compare column values with a list and produce output with matching

Question

I have a dataframe with year-month as index. I want to assign a color to the dataframe based on the year the sample was collected.

import matplotlib.colors as mcolors
colors_list = list(mcolors.XKCD_COLORS.keys())
colors_list =
['xkcd:cloudy blue',
 'xkcd:dark pastel green',
 'xkcd:dust',
 'xkcd:electric lime',
 'xkcd:fresh green',
 'xkcd:light eggplant'
........
]

df =           
   sensor_value     Year    Month
0   5171.318942     2002    4
1   5085.094086     2002    5
3   5685.681944     2004    6
4   6097.877688     2006    7
5   6063.909946     2003    8
.....
years_list = df['Year'].unique().tolist()
req_colors_list = colors_list[:len(years_list)]

df['year_color'] = df['Year'].apply(lambda x: clr if x==year else np.nan for year,clr in zip(years_list,req_colors_list))

Present output:

<lambda>    <lambda>    <lambda>    <lambda>    <lambda>    <lambda>    <lambda>    <lambda>    <lambda>    <lambda>
Year                                        
2002    tab:blue    NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
2002    tab:blue    NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
2006    tab:blue    NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
2006    tab:blue    NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
2003    tab:blue    NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...

Expected output:

2002   'xkcd:cloudy blue'
2002   'xkcd:cloudy blue'
2006   'xkcd:fresh green'
2006   'xkcd:fresh green'
2003

score 2 · Answer 1 · answered Jun 16 '23 at 06:26

To assign colors to the DataFrame based on the year of the sample, you can modify your lambda function:

df['year_color'] = df['Year'].apply(lambda x: req_colors_list[years_list.index(x)] if x in years_list else np.nan)

This lambda function checks if the year x is present in the years_list. If it is, it retrieves the corresponding color from the req_colors_list using the index. Otherwise, it assigns np.nan to indicate missing values.

Because the colors_list contains a limited number of colors, there will be cases where multiple years have the same color.

jezrael · Accepted Answer · 2023-06-16T06:31:17.957

Use Series.map by dictionary generated by zip:

df['year_color'] = df['Year'].map(dict(zip(years_list, colors_list)))
print (df)
   sensor_value  Year  Month              year_color
0   5171.318942  2002      4        xkcd:cloudy blue
1   5085.094086  2002      5        xkcd:cloudy blue
3   5685.681944  2004      6  xkcd:dark pastel green
4   6097.877688  2006      7               xkcd:dust
5   6063.909946  2003      8      xkcd:electric lime

If number of unique years is less like number of column, map generate NaNs:

colors_list =['xkcd:cloudy blue',
              'xkcd:dark pastel green',
              'xkcd:dust']

years_list = df['Year'].unique().tolist()

df['year_color'] = df['Year'].map(dict(zip(years_list, colors_list)))
print (df)
   sensor_value  Year  Month              year_color
0   5171.318942  2002      4        xkcd:cloudy blue
1   5085.094086  2002      5        xkcd:cloudy blue
3   5685.681944  2004      6  xkcd:dark pastel green
4   6097.877688  2006      7               xkcd:dust
5   6063.909946  2003      8                     NaN

Python Dataframe compare column values with a list and produce output with matching

2 Answers2