Can dcast be used without an aggregate function?

Question

Possible Duplicate:
This R reshaping should be simple, but

dcast from reshape2 works without a formula where there are no duplicates. Take these example data:

df <- structure(list(id = c("A", "B", "C", "A", "B", "C"), cat = c("SS", 
"SS", "SS", "SV", "SV", "SV"), val = c(220L, 222L, 223L, 224L, 
225L, 2206L)), .Names = c("id", "cat", "val"), class = "data.frame", row.names = c(NA, 
-6L))

I'd like to dcast these data and just have the values tabulated, without applying any function to the value.var including the default length.

In this case, it works fine.

> dcast(df, id~cat, value.var="val")
  id  SS   SV
1  A 220  224
2  B 222  225
3  C 223 2206

But when there are duplicate variables, the fun defaults to length. Is there a way to avoid it?

df2 <- structure(list(id = c("A", "B", "C", "A", "B", "C", "C"), cat = c("SS", 
"SS", "SS", "SV", "SV", "SV", "SV"), val = c(220L, 222L, 223L, 
224L, 225L, 220L, 1L)), .Names = c("id", "cat", "val"), class = "data.frame", row.names = c(NA, 
-7L))

> dcast(df2, id~cat, value.var="val")
Aggregation function missing: defaulting to length
  id SS SV
1  A  1  1
2  B  1  1
3  C  1  2

Ideally what I'm looking for is to add a fun = NA, as in don't try to aggregate the value.var. The result I'd like when dcasting df2:

 id  SS  SV
1  A 220 224
2  B 222 225
3  C 223 220
4. C NA  1

Just add that in as another row with a `NA` for any missing values. — Maiasaura, Oct 11 '12 at 03:16
@Dason Is it kosher to answer my own question now that I figured out a solution? Or should I just delete the q? — Maiasaura, Oct 11 '12 at 03:41
It's definitely kosher. It's always nice seeing the different ways the problem can be solved. — Dason, Oct 11 '12 at 03:48

score 22 · Accepted Answer · answered Oct 11 '12 at 03:42

I don't think there is a way to do it directly but we can add in an additional column which will help us out

df2 <- structure(list(id = c("A", "B", "C", "A", "B", "C", "C"), cat = c("SS", 
"SS", "SS", "SV", "SV", "SV", "SV"), val = c(220L, 222L, 223L, 
224L, 225L, 220L, 1L)), .Names = c("id", "cat", "val"), class = "data.frame", row.names = c(NA, 
-7L))

library(reshape2)
library(plyr)
# Add a variable for how many times the id*cat combination has occured
tmp <- ddply(df2, .(id, cat), transform, newid = paste(id, seq_along(cat)))
# Aggregate using this newid and toss in the id so we don't lose it
out <- dcast(tmp, id + newid ~ cat, value.var = "val")
# Remove newid if we want
out <- out[,-which(colnames(out) == "newid")]
> out
#  id  SS  SV
#1  A 220 224
#2  B 222 225
#3  C 223 220
#4  C  NA   1

Thanks, I arrived at the same conclusion. – Maiasaura Oct 11 '12 at 03:44 — Maiasaura, Oct 11 '12 at 03:44

score 9 · Answer 2 · answered Oct 11 '12 at 03:45

9

I figured out the same solution while Dason was answering mine.

I realized that dcast simply does not know how to deal with duplicates. The way I figured out how to trick it was by adding another unique identifer so it doesn't get confused by duplicates.

In this example:

df <- ddply(df2, .(cat), function(x){ x$id2 = 1:nrow(x); x})
>  dcast(df, id+id2~cat, value.var="val")[,-2]
  id  SS  SV
1  A 220 224
2  B 222 225
3  C 223 220
4  C  NA   1

answered Oct 11 '12 at 03:45

Maiasaura

32,226
27
104
108

that was so helpful for a similar case for me. Thank you :) – sarah Apr 19 '16 at 14:23
You saved my day!! – Marco Fumagalli Jan 17 '18 at 18:02

Can dcast be used without an aggregate function?

2 Answers2

Linked