-1

Dataframe consist of 3 rows: wine_id, taste_group and and evaluated matching score for each of that group:

wine_id taste_group score
22 tree_fruit 87
22 citrus_fruit 98
22 tropical_fruit 17
22 earth 8
22 microbio 6
22 oak 7
22 vegetal 1

How to achieve to make a separate column for each taste_group and to list scores in rows? Hence this:

wine_id tree_fruit citrus_fruit tropical_fruit earth microbio oak vegetal
22 87 98 17 8 6 7 1

There are 13 taste groups overall, along with more than 6000 Wines. If the wine doesn't have a score for taste_group row takes value 0.

I used

length(unique(tastes$Group))
length(unique(tastes$Wine_Id))

in R to question basic measures. How to proceed to wanted format?

AndrewGB
  • 16,126
  • 5
  • 18
  • 49
vulkanovski
  • 159
  • 7

3 Answers3

2

Assuming your dataframe is named tastes, you'll want something like:

library(tidyr)

tastes %>%
    # Get into desired wide format
    pivot_wider(names_from = taste_group, values_from = score, values_fill = 0)
Daniel Molitor
  • 644
  • 4
  • 11
  • 1
    instead of the `mutate` use the parameter `values_fill = 0` ie `tastes %>% pivot_wider(names_from = taste_group, values_from = score, values_fill = 0)` after which we can upvote – Onyambu Dec 26 '21 at 03:30
  • Nifty! However some values are in vector shape c(5,5)? Any idea why? @Onyambu – vulkanovski Dec 26 '21 at 03:53
  • @stepski011 thats because they hasve the same ID values. ie same `wine_id` and same `taste_group`. You could summarize them depending on what you want. eg by summing them, finding the mean etc. just add `values_fn = sum` – Onyambu Dec 26 '21 at 04:07
  • @Onyambu I see that last function provides the sum calculation - but not the one that I need. There is a problem with output: after pivot_wider function from above I get Error: Can't convert to . with couple warning messages considering values_fn. How to return only dataframe with score values? – vulkanovski Dec 26 '21 at 04:18
  • @stepski011 sorry cant tell without the data. – Onyambu Dec 26 '21 at 04:34
2

In R, this is called as the long-to-wide reshaping, you can also use dcast to do that.

library(data.table)

dt <- fread("
wine_id taste_group score
22  tree_fruit  87
22  citrus_fruit    98
22  tropical_fruit  17
22  earth   8
22  microbio    6
22  oak 7
22  vegetal 1
")

dcast(dt, wine_id ~ taste_group, value.var = "score")

#wine_id citrus_fruit earth microbio   oak tree_fruit tropical_fruit vegetal
#     <int>        <int> <int>    <int> <int>      <int>          <int>   <int>
#      22           98     8        6     7         87             17       1
Peace Wang
  • 2,399
  • 1
  • 8
  • 15
0

Consider reshape:

wide_df <- reshape(
     my_data, 
     timevar="taste_group", 
     v.names = "score", 
     idvar = "wine_id", 
     direction = "wide"
)
Parfait
  • 104,375
  • 17
  • 94
  • 125