0

Let's say I originally have a dataframe df:

name age
0 Robert 21
1 Bob 22
2 Mike 23

And there's also a dataframe df2:

name attribute value
0 Robert email robert@email.com
1 Bob email bob@email.com
2 Robert married? Yes
3 Mike married? No
4 Mike employed? Yes
5 Robert have a car? Yes
6 Bob have a car? No

Beforehand, I already know that a person will have the attributes, for example, "age", "email", "married?", "employed?", "have a car?", "imigrant?", "student?".

How can I get a df_final like this:

name age email married? employed? have a car? imigrant? student?
0 Robert 21 robert@email.com Yes Yes
1 Bob 22 bob@email.com No
2 Mike 23 No Yes

I thought about creating the attributes columns by hand in df and then use df.loc[df['name']==df2['name'], df2['attribute']] to find find the correct indexes to insert the values, but I couldn't get it right.

  • `df.pivot(index='name', columns='attribute', values='value')` then merge the frames together. – It_is_Chris Mar 16 '23 at 15:07
  • Does this answer your question? [How can I pivot a dataframe?](https://stackoverflow.com/questions/47152691/how-can-i-pivot-a-dataframe) – It_is_Chris Mar 16 '23 at 15:07

1 Answers1

1

You need to pivot the second data frame, then merge the result to the first. The merge is performed by matching the 'name' column from df to the index (also names) of the pivoted data.

So we need to pass left_on='name' to use the df.name column, and right_index=True to use the index from the pivot data.

To ensure the additional columns are present, we can add them after the merge.

df3 = df.merge(
   df2.pivot(index='name', columns='attribute', values='value'),
   how='left',
   left_on='name',
   right_index=True,
)

for col in ['email', 'married?', 'employed?', 'have a car?', 'imigrant?', 'student?']:
    if not col in df3:
        df3[col] = ''

You will end up with some NA values, which you can leave or fill with .fillna.

James
  • 32,991
  • 4
  • 47
  • 70
  • What about the attribute columns that didnt't appear on `df2` ("imigrant?", "student?") ? I'm asking this because I already know which columns needs to be on `df_final`, but I also know that not all of them will appear on `df2`. – NicholasHenrique Mar 16 '23 at 17:09
  • 1
    added an update for the missing columns – James Mar 17 '23 at 11:37