0
ID  cat1 cat2 cat3    loss
1    A    B    D    2213.18
2    A    B    A    1283.60
3    A    B    B    3005.09
4    B    A    A    939.85
5    A    B    C    2763.85
6    A    A    A    5142.87

There are 116 categorical variables of different levels of which I have listed down three. Below is the function I have used to calculate mean(loss) for every level in a variable

a1<-summarise(group_by(ins,cat85), cat85_mean=mean(loss))

Need a code which dynamically does this for the remaining variables so that I have the mean(loss) for all the categorical variables across different levels

Eg : Cat85 has 4 levels namely A,B,C and D. The function should generate the mean(loss) for A, B, C and D like A-2000, B-1234.5, C-5667.5, D- 3465.2.

Thanks!

etienne
  • 3,648
  • 4
  • 23
  • 37
  • 1
    Please use `dput()` to provide your example data as shown in http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. This allows to copy/paste your example data. – tobiasegli_te Oct 18 '16 at 11:02
  • @tobiasegli_te Sure. Will do it here on. Thanks! – Amit Miskin Oct 20 '16 at 05:03

2 Answers2

1

First, get the categorical variable names into a vector. Assuming they all start with "cat":

nn <- grep("cat", names(foo), value=TRUE)

Then find the mean-by value for each categorical variable:

foo <- lapply(nn,
              function(n, dat) {
                  tapply(dat$loss, dat[,n], mean)
              }, 
              ins[,c(nn,"loss")])

And name the list elements:

names(ins) <- nn
Jason
  • 2,507
  • 20
  • 25
  • hrm... I was thinking of a way to do this using dplyr, and it just ocurred to me that there is a `summarize_each` function, but I don't think there's a `group_by_each` or so... maybe this functionality doesn't exist? – Amit Kohli Oct 18 '16 at 11:28
1

Here's a solution using dplyr:

lapply(grep("cat",names(ins), value = T),function(x){
    summarise(group_by_(ins,.groups=x), catX_mean=mean(loss))
})

[[1]]
# A tibble: 2 × 2
  .groups  catX_mean
    <chr>      <dbl>
1       A 0.04570735
2       B 0.76317575

For brevity, I show only the result for the first column. Note that I used different values for "loss" than in your example data.

tobiasegli_te
  • 1,413
  • 1
  • 12
  • 18
  • I am getting an empty list when this code. Expecting something like it produces the mean(loss) for different levels in each variables. I feel there is no other way than to create datasets for each categorical variable. Please let me know if there is a better way to do this – Amit Miskin Oct 19 '16 at 07:41
  • @AmitMiskin There was a typo, in the code, please try to run it again – tobiasegli_te Oct 19 '16 at 07:54