5

Is it possible to change the default separator when cast (dcast) assigns new column headers?

I am converting a file from long to wide, and I get the following headers:

value_1, value_2, value_3,...  

In reshape you can assign the "sep" parameter (sep="") and the column headers output like I want them to:

value1, value2, value3,... 

However, reshape takes minutes for my data frame with over 200,000 rows, whereas dcast takes seconds. dcast also outputs the columns in the order I want, where reshape does not. Is there any easy way to change the output with dcast, or do I need to change the column headers manually?

For example:

example <- data.frame(id=rep(c(1,2,3,4),4),index=c(rep(1,4),rep(2,4),rep(1,4),rep(2,4)),variable=c(rep("resp",8),rep("conc",8)),value=rnorm(16,5,1))
dcast(example,id~variable+index)

The example gives the column headers:

conc_1, conc_2, resp_1, resp_2

I want the column headers to read:

conc1, conc2, resp1, resp2

I have tried:

dcast(example,id~variable+index,sep="")

dcast appears to ignore sep entirely, because giving a symbol does not change the output either.

dayne
  • 7,504
  • 6
  • 38
  • 56

4 Answers4

3

You can't, since that option wasn't incorporated into dcast. But it's fairly trivial to do this after running dcast.

casted_data <- dcast(example,id~variable+index)


library(stringr)
names(casted_data) <- str_replace(names(casted_data), "_", ".")

> casted_data
  id   conc.1   conc.2   resp.1   resp.2
1  1 5.554279 5.225686 5.684371 5.093170
2  2 4.826810 5.484334 5.270886 4.064688
3  3 5.650187 3.587773 3.881672 3.983080
4  4 4.327841 4.851891 5.628488 4.305907

# If you need to do this often, just wrap dcast in a function and 
# change the names before returning the result.

f <- function(df, ..., sep = ".") {
    res <- dcast(df, ...)
    names(res) <- str_replace(names(res), "_", sep)
    res
}

> f(example, id~variable+index, sep = "")
  id   conc1   conc2   resp1   resp2
1  1 5.554279 5.225686 5.684371 5.093170
2  2 4.826810 5.484334 5.270886 4.064688
3  3 5.650187 3.587773 3.881672 3.983080
4  4 4.327841 4.851891 5.628488 4.305907
Maiasaura
  • 32,226
  • 27
  • 104
  • 108
  • I know replacing the names is easy, but I was looking for a way around that if possible. I also have other column headers with an underscore, so it would take a few more lines of code - although it would still be easy enough. – dayne Sep 20 '12 at 17:21
  • This _is_ the way around. You could rewrite `dcast` and internal functions from `reshape2` but that would be even more work and completely unnecessary. Wrapping in a function will leave it at the same number of lines of code (just 1) – Maiasaura Sep 20 '12 at 17:23
3

dcast in the data.table package (dev version 1.9.5) now has the 'sep' argument.

dbetebenner
  • 153
  • 5
3

Based on information provided by dbetebenner and another example of using data.table for improved dcast functionality, your example becomes:

> library(data.table)
> dcast(setDT(example), id ~ variable + index, sep="")
   id    conc1    conc2    resp1    resp2
1:  1 5.113707 5.475527 5.938592 4.149636
2:  2 4.261278 6.138082 5.277773 5.907054
3:  3 4.350663 4.292398 6.277582 4.167552
4:  4 5.993198 6.601669 5.232375 5.037936

setDT() converts lists and data.frames to data.tables by reference.

Tested with data.table v1.9.6.

Community
  • 1
  • 1
arekolek
  • 9,128
  • 3
  • 58
  • 79
1

One option:

example <- data.frame(example,by=paste(example$variable,example$index,sep=""))
dcast(example,id~by)
dayne
  • 7,504
  • 6
  • 38
  • 56