1

I have a dataset which has four columns of data.

I want to group the rows by two variables, and the columns by one variable

Here is an example of my data

df <- data.frame(
Price = rnorm(24), 
Grouping = rep(c("CD", "NW", "SMK", "ghd"),6),
Sub_grouping = c("CDapple", "NWapple", "SMKapple", "ghdapple",
               "CDPear", "NWpear", "SMKpear", "ghdpear",
               "CDgrape",  "NWgrape", "SMKgrape", "ghdgrape",
               "CDapple", "NWapple", "SMKapple", "ghdapple",
               "CDPear", "NWpear", "SMKpear", "ghdpear",
               "CDgrape",  "NWgrape", "SMKgrape", "ghdgrape"),
SP = rep(c("SP", "OffSP"),12))

To get the average of the Price variable for each subgroup, I can run the following:

df <- melt(df)
df_mean <- dcast(df, Grouping + Sub_grouping ~ SP, value.var = "value",  fun.aggregate = mean)

I would also like the average value of Price for each grouping variable. Is this possible?

I also would like to count the number of values for Price which feed into each average Price. So for each Group, by SP and OffSP, the number of prices which feed into this; and for each sub_group, by SP and OffSP, the number of prices which feed into this.

Does anyone know how to do this?

I've seen these questions Create a column with a count of occurrences in R How can I count the number of instances a value occurs within a subgroup in R? But their contingency tables are 2x2, and I need a table with grouping and sub-group as rows, with SP / OffSP as columns.

Thanks

Community
  • 1
  • 1
Laura
  • 499
  • 5
  • 13

1 Answers1

2

We don't need to reshape it to 'long' format to get the mean values

library(dplyr)
df %>% 
   group_by(Grouping) %>% #first grouping
   #create the mean column and the count by 'Grouping'
   mutate(AvgPrice = mean(Price), n1 = n()) %>% 
   group_by(Sub_grouping, add= TRUE) %>% #second grouping
   #summarise to get the mean within Sub_grouping and count the values with n()
   summarise(AvgPrice = first(AvgPrice), n1 = first(n1), AvgPrice2 = mean(Price), n2 = n())

NOTE: If we also need to group by 'SP', then change the first group_by statement to

df
  %>%
   group_by(Grouping, SP) %>%
   ...
   ...

If we want to get the mean and length for each 'SP' and wants as separate column, one compact option is dcast from data.table which can take multiple functions and multiple value.var columns

library(data.table)
dcast(setDT(df), Grouping + Sub_grouping ~ SP, value.var = "Price", c(mean, length))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you very much. Any idea how to get the counts? Cheers. – Laura Mar 29 '17 at 03:43
  • @Laura Based on your description, do you need `SP` also as grouping variable – akrun Mar 29 '17 at 03:47
  • Yes please @akrun, if that's possible. Thank you. – Laura Mar 29 '17 at 03:49
  • @Laura In that case, just change the first group by, i.e. `df %>% group_by(Grouping, SP) %>% mutate(AvgPrice = mean(Price), n1 = n()) %>% group_by(Sub_grouping, add= TRUE) %>% summarise(AvgPrice = first(AvgPrice), n1 = first(n1), AvgPrice2 = mean(Price), n2 = n())` – akrun Mar 29 '17 at 03:49
  • Perfect. Thanks so much. – Laura Mar 29 '17 at 03:53
  • This is great, but the problem is that the grouping by SP still groups by row. I want to group by SP for the columns. So I'd end up with 8 additional columns: average value and count for Group and Subgroup by SP, and average value and count for Group and Subgroup by OffSP. Do you know if this is possible? – Laura Mar 29 '17 at 04:05
  • Thanks @akrun. When I run this code I get the following error message Error in .fun(.value[0], ...) : 2 arguments passed to 'length' which requires 1 – Laura Mar 29 '17 at 20:22
  • @Laura It is working fine with the example data you showed. I am using `data.table_1.10.4` – akrun Mar 30 '17 at 02:54