Aggregate by multiple columns and reshape from long to wide

Question

There are some questions similar to this topic on SO but not exactly like my usecase. I have a dataset where the columns are laid out as shown below

     Id        Description          Value
     10        Cat                  19
     10        Cat                  20
     10        Cat                  5
     10        Cat                  13
     11        Cat                  17
     11        Cat                  23
     11        Cat                  7
     11        Cat                  14  
     10        Dog                  19
     10        Dog                  20
     10        Dog                  5
     10        Dog                  13
     11        Dog                  17
     11        Dog                  23
     11        Dog                  7
     11        Dog                  14

What I am trying to do is capture the mean of the Value column by Id, Description. The final dataset would look like this.

     Id       Cat         Dog 
     10       14.25       28.5
     11       15.25       15.25

I can do this in a very rough manner not very efficient like this

tempdf1 <- df %>%
  filter(str_detect(Description, "Cat")) %>%
   group_by(Id, Description) %>%
  summarize(Mean_Value = mean(Value) , na.rm = TRUE))

This is not very convenient. Any advise on how how to accomplish the expected results more efficiently is much appreciated.

This is really a combo of [_How to sum a variable by group?_](https://stackoverflow.com/q/1660124/4497050) and [_How to reshape data from long to wide format?_](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) — alistaire, Dec 22 '17 at 21:22

score 2 · Answer 1 · answered Dec 22 '17 at 21:09

2

I would do this with tapply:

with( dat, tapply(Value, list(Id,Description), mean))
     Cat   Dog
10 14.25 14.25
11 15.25 15.25

Does return a matrix object so don't try accessing with "$".

answered Dec 22 '17 at 21:09

IRTFM

258,963
21
364
487

score 1 · Answer 2 · answered Dec 22 '17 at 20:58

1

You can aggregate (calculate average) per groups using data.table and get wanted table format using dcast():

library(data.table)
foo <- setDT(d)[, mean(Value), .(Id, Description)]
#    Id Description    V1
# 1: 10         Cat 14.25
# 2: 11         Cat 15.25
# 3: 10         Dog 14.25
# 4: 11         Dog 15.25
dcast(foo, Id ~ Description, value.var = "V1")
#    Id   Cat   Dog
# 1: 10 14.25 14.25
# 2: 11 15.25 15.25

answered Dec 22 '17 at 20:58

pogibas

27,303
19
84
117

thanks Po, I am adverse about using data.table library. That library has given me nothing by headache. It works sometimes, most of the of the time it has crashed by system. – bison2178 Dec 22 '17 at 21:00
1

You can use dcast directly: `dcast(dat,Id~Description,mean)` – Onyambu Dec 22 '17 at 21:01

Onyambu · Accepted Answer · 2017-12-22T21:09:16.287

1

Use dcast or even acast from reshape2() package

dcast(dat,Id~Description,mean)
   Id   Cat   Dog
 1 10 14.25 14.25
 2 11 15.25 15.25

Base R might be abit longer:

 reshape(aggregate(.~Id+Description,dat,mean),direction = "wide",v.names  = "Value",idvar = "Id",timevar = "Description")
  Id Value.Cat Value.Dog
1 10     14.25     14.25
2 11     15.25     15.25

edited Dec 22 '17 at 21:09

answered Dec 22 '17 at 21:02

Onyambu

67,392
3
24
53

score 1 · Answer 4 · edited Dec 22 '17 at 21:14

1

You can do the summarise using dplyr and the transformation from long to wide using tidyr::spread:

library(dplyr)
library(tidyr)

df %>%
    group_by(Id, Description) %>%
    summarise(Mean = mean(Value)) %>% 
    spread(Description, Mean)

     Id   Cat   Dog
* <int> <dbl> <dbl>
1    10 14.25 14.25
2    11 15.25 15.25

edited Dec 22 '17 at 21:14

alistaire

42,459
4
77
117

answered Dec 22 '17 at 21:05

Lamia

3,845
1
12
19

Aggregate by multiple columns and reshape from long to wide

4 Answers4

Linked

Related