How to aggregate data frame in r

Question

I have created the following minimal example.

I want to turn this data frame (which is a melt so that there are three columns: Time, Room and ID)

   structure(list(
  Time = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3,
           3, 3, 3, 3, 3, 3, 3),
  Room = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e", "a", "a", "b", "b",
           "c", "c", "d", "d", "e", "e", "a", "a", "b","b", "c", "c", "d", "d",
           "e", "e"),
  ID   = c("A", NA, NA, NA, NA, NA, NA, "B", NA, NA, NA, NA, NA, "C", NA, "D",
           NA, "E", NA, "F", NA, NA, NA, "G", NA, NA, NA, "H", NA, "I")),
  class     = c("tbl_df", "tbl", "data.frame"),
  row.names = c(NA, 30L),
  .Names    = c("Time", "Room", "ID"))

into this data frame

   structure(
  list(
    Time = c(1, 2, 3),
    a = c("A", NA, NA),
    b = c(NA, "C", "G"),
    c = c(NA, "D", NA),
    d = c("B", "E", "H"),
    e = c(NA, "F", "I")
  ),
  class = c("tbl_df", "tbl", "data.frame"),
  row.names = c(NA, -3L),
  .Names = c("Time", "a", "b", "c", "d", "e")
)

which has the rooms as columns, the time as rows and the ID as entry.

I tried the following:

dcast(df, Time~Room, fun.aggregate=NULL, value.var='ID')

but this says: Aggregation function missing: defaulting to length and doesn't return the ID value although the structure looks ok.

I also tried aggregate but can't seem to know what to do.

Related: [*dcast error: ‘Aggregation function missing: defaulting to length’*](https://stackoverflow.com/q/33051386/2204410) — Jaap, Aug 31 '18 at 05:37

score 2 · Accepted Answer · answered Mar 03 '16 at 22:19

2

You can write your own aggregation function to pull out the first value that isn't NA:

dcast(df, Time ~ Room, fun.aggregate = function(x){x[!is.na(x)][1]}, value.var = 'ID')

which returns

  Time    a    b    c d    e
1    1    A <NA> <NA> B <NA>
2    2 <NA>    C    D E    F
3    3 <NA>    G <NA> H    I

There may be a simpler way, but it works, at least. It does assume you won't have different non-NA values for ID for the same combination of Time and Room, so know your data.

answered Mar 03 '16 at 22:19

alistaire

42,459
4
77
117

Thx. Your solution works on my minimal example, but when I apply it to my bigger data frame I only seem to get the first non null room per time. – Geoff Mar 03 '16 at 22:31
@Geoff - isn't this just a long-to-wide reshape - `reshape(df[complete.cases(df),], idvar="Time", timevar="Room", direction="wide")` ? Maybe try substituting `df[complete.cases(df),]` instead of `df` in this `dcast` answer – thelatemail Mar 03 '16 at 22:31
@thelatemail Maybe you are right. When I try it on my full data frame though, I get the exact same problem as my comment above (except your version also renames the columns adding a ID. prefix). – Geoff Mar 03 '16 at 22:36
@Geoff Yeah I assumed there was only one (non-null) room per time. If there can be multiple, how do you want your data aggregated? You can write a function with `paste` or whatnot, but I'm not really sure that's a useful arrangement. – alistaire Mar 03 '16 at 22:38
@ alistaire I could try a simple string concatenate. But how? – Geoff Mar 03 '16 at 22:43
1

You need to account for cases with all `NA` or you'll get empty strings. All told: `function(x){ifelse(all(is.na(x)), as.character(NA), paste(x[!is.na(x)], collapse = ', '))}` – alistaire Mar 03 '16 at 22:44
1

@ alistaire Bingo! That's the jack pot. Many thanks indeed. – Geoff Mar 03 '16 at 22:48

How to aggregate data frame in r

1 Answers1