I want to generate a dataframe that contains lists of a person's potential favorite crayon colors, based on their favorite color. I have two dataframes that contain the necessary information:
df1 = pd.DataFrame({'person':['Jeff','Marie','Jenna','Mike'], 'color':['blue', 'purple', 'brown', 'green']}, columns=['person','color'])
df2 = pd.DataFrame({'possible_crayons':['christmas red','infra red','scarlet','sunset orange', 'neon carrot','lemon','forest green','pine','navy','aqua','periwinkle','royal purple'],'color':['red','red','red','orange','orange','yellow','green','green','blue','blue','purple','purple']}, columns=['possible_crayons','color'])
I want to reference one database against the other by matching the df1 color entry to the df2 color entry, and returning the corresponding possible_crayons values as a list in a new column in df1. Any terms that did not find a match would be labeled N/A. So the desired output would be:
person favorite_color possible_crayons_list
Jeff blue [navy, aqua]
Marie purple [periwinkle, royal purple]
Jenna brown NaN
Mike green [forest green, pink]
I've tried:
mergedDF = pd.merge(df1, df2, how='left')
However, this results in the following:
person color possible_crayons
0 Jeff blue navy
1 Jeff blue aqua
2 Marie purple periwinkle
3 Marie purple royal purple
4 Jenna brown NaN
5 Mike green forest green
6 Mike green pine
Is there any way to achieve my desired output of lists?