1

Long time reader and first time poster, let's see how this goes...

I am working in R to create a summary of average out of pocket costs for different drugs based on different health care providers. In the data I have many more companies (~5000) than I do products (4). I know to start off by aggregating the out of pocket cost by product and health care provider as shown below:

avgdf <- aggregate(price ~ company + product, data= df, mean)
colnames(avgdf) <- c("company", "prod", "avg_price")

The resulting data frame looks like this: (Note for confidentiality reasons I cannot post the actual data but have to show a generic example)

company prod    avg_price
A       1       88
A       2       63
A       3       46
B       1       55
C       2       8
D       1       67
D       2       42
D       3       40
D       4       61
E       1       13
E       2       17
F       1       85
F       4       17 

I want to transform the data frame so that the prod column is split into 4 columns, one for each of the respective products, and the values of these 4 columns are filled in according to its company-product pair. In other words, I want the table to look like this:

company prod1.avg_price prod2.avg_price prod3.avg_price prod4.avg_price
A       88              63              46              NA
B       55              NA              NA              NA
C       NA              2               NA              NA
D       67              42              40              61
E       13              17              NA              NA
F       85              NA              NA              17 

I shouldn't have as many NA's in my dataset as there are in my example, but I want a solution that can handle it. My guess is to use reshape2 melt and dcast functions but I am not sure how to implement it. Thank you in advance for the help!

Yusuf
  • 11
  • 2
  • With dplyr and tidyr, `avgdf %>% mutate(prod = paste0('prod', prod, '.avg_price')) %>% spread(prod, avg_price)` – alistaire Jul 14 '16 at 03:04

1 Answers1

1

We can use dcast from data.table to reshape it to 'wide' format.

library(data.table)
dcast(setDT(avgdf), company~paste0("prod", prod, ".avg_price"), value.var = "avg_price")
# company prod1.avg_price prod2.avg_price prod3.avg_price prod4.avg_price
#1:       A              88              63              46              NA
#2:       B              55              NA              NA              NA
#3:       C              NA               8              NA              NA
#4:       D              67              42              40              61
#5:       E              13              17              NA              NA
#6:       F              85              NA              NA              17
akrun
  • 874,273
  • 37
  • 540
  • 662