Dummy coding in pandas with custom value weights

Question

I have data in the shape as follows:


pd.DataFrame({'id': [1,2,3], 'item': ['item_a', 'item_a', 'item_b'],
                  'score': [1,-1,1]})

id  item    score
1   item_a    1
2   item_a    -1
3   item_b    1

I want to get dummy codes for the item column, but I want them scored based on their values in the score column. If there are no observations, I want a 0 imputed. Like so:

id  item_a  item_b
1      1      0
2     -1      0
3      0      1

As you see, I want to capture that user id 1 liked item_a, that id 2 disliked item_a, and that user 3 did not interact with item_a. The id column is not unique per row - for example, user id 3 could have liked item_a, and that would be recorded as a new row in the original dataframe.

I've tried using get_dummies in pandas, but that method only calculates the number of observed values in the "item" column, it doesn't take into account the score values.

score 3 · Accepted Answer · answered Feb 11 '19 at 19:06

3

What you want, is a pivot_table:

df.pivot_table(values='score', index='id', columns='item', fill_value=0)
#item  item_a  item_b
#id                  
#1          1       0
#2         -1       0
#3          0       1

answered Feb 11 '19 at 19:06

Jondiedoop

3,303
9
24

Dani Mesejo · Answer 2 · 2019-02-11T19:25:14.700

1

You could use set_index + unstack:

import pandas as pd

df = pd.DataFrame({'id': [1,2,3], 'item': ['item_a', 'item_a', 'item_b'],
                  'score': [1,-1,1]})

print(df.set_index(['id', 'item'])['score'].unstack(fill_value=0))

Output

item  item_a  item_b
id                  
1          1       0
2         -1       0
3          0       1

If you want to exactly match your output just add this lines:

result.columns.name = None
result.reset_index(inplace=True)

Output

   id  item_a  item_b
0   1       1       0
1   2      -1       0
2   3       0       1

edited Feb 11 '19 at 19:25

answered Feb 11 '19 at 19:11

Dani Mesejo

61,499
6
49
76

df.set_index(['id', 'item'])['score'].unstack(fill_value=0) – BENY Feb 11 '19 at 19:19
Why the ['score']? – Dani Mesejo Feb 11 '19 at 19:20
You have multiple index in columns – BENY Feb 11 '19 at 19:20
@Wen-Ben THanks, updated the answer! – Dani Mesejo Feb 11 '19 at 19:25
1

Looks better now – BENY Feb 11 '19 at 19:25

Dummy coding in pandas with custom value weights

2 Answers2