Long time reader and first time poster, let's see how this goes...
I am working in R to create a summary of average out of pocket costs for different drugs based on different health care providers. In the data I have many more companies (~5000) than I do products (4). I know to start off by aggregating the out of pocket cost by product and health care provider as shown below:
avgdf <- aggregate(price ~ company + product, data= df, mean)
colnames(avgdf) <- c("company", "prod", "avg_price")
The resulting data frame looks like this: (Note for confidentiality reasons I cannot post the actual data but have to show a generic example)
company prod avg_price
A 1 88
A 2 63
A 3 46
B 1 55
C 2 8
D 1 67
D 2 42
D 3 40
D 4 61
E 1 13
E 2 17
F 1 85
F 4 17
I want to transform the data frame so that the prod
column is split into 4 columns, one for each of the respective products, and the values of these 4 columns are filled in according to its company-product pair. In other words, I want the table to look like this:
company prod1.avg_price prod2.avg_price prod3.avg_price prod4.avg_price
A 88 63 46 NA
B 55 NA NA NA
C NA 2 NA NA
D 67 42 40 61
E 13 17 NA NA
F 85 NA NA 17
I shouldn't have as many NA's in my dataset as there are in my example, but I want a solution that can handle it. My guess is to use reshape2
melt
and dcast
functions but I am not sure how to implement it. Thank you in advance for the help!