Consider this Pandas dataframe:
df = pd.DataFrame({
'User ID': [1, 2, 2, 3],
'Cupcakes': [1, 5, 4, 2],
'Biscuits': [2, 5, 3, 3],
'Score': [0.65, 0.12, 0.15, 0.9]
})
ie.
User ID Cupcakes Biscuits Score
0 1 1 2 0.65
1 2 5 5 0.12
2 2 4 3 0.15
3 3 2 3 0.90
I want to tidy ("melt") this data so that the dessert type are separate observations. But I also want to keep the score for each user.
Using melt()
directly doesn't work:
df.melt(
id_vars=['User ID'],
value_vars=['Cupcakes', 'Biscuits'],
var_name='Dessert', value_name='Enjoyment'
)
...gives:
User ID Dessert Enjoyment
0 1 Cupcakes 1
1 2 Cupcakes 5
2 2 Cupcakes 4
3 3 Cupcakes 2
4 1 Biscuits 2
5 2 Biscuits 5
6 2 Biscuits 3
7 3 Biscuits 3
I've lost the score data!
I can't use wide_to_long()
because I don't have a common "stub name" for my dessert types.
I can't join or merge the tidied data with the original data because the tidied data is reindexed and the user ID is not unique for each observation.
How do I tidy this data but retain columns that aren't involved in the tidying?