4

Given a df in semi-long format with id variables a and b and measured data in columns m1and m2. The type of data is specified by the variable v (values var1 and var2).

set.seed(8)

df_l <- 
  data.frame(
    a = rep(sample(LETTERS,5),2),
    b = rep(sample(letters,5),2),
    v = c(rep("var1",5),rep("var2",5)),
    m1 = sample(1:10,10,F),
    m2 = sample(20:40,10,F)) 

Looks as:

   a b    v m1 m2
1  W r var1  3 40
2  N l var1  6 32
3  R a var1  9 28
4  F g var1  5 21
5  E u var1  4 38
6  W r var2  1 35
7  N l var2  8 33
8  R a var2 10 29
9  F g var2  7 30
10 E u var2  2 23

If I want to make a wide format of values in m1 using id a as rows and values in v1as columns I do:

> reshape2::dcast(df_l, a~v, value.var="m1")
  a var1 var2
1 E    4    2
2 F    5    7
3 N    6    8
4 R    9   10
5 W    3    1

How do I write a function that does this were arguments to dcast (row, column and value.var) are supplied as arguments, something like:

fun <- function(df,row,col,val){
  require(reshape2)
  res <-
    dcast(df, row~col, value.var=val)
  return(res)
}

I checked SO here and here to try variations of match.call and eval(substitute()) in order to "get" the arguments inside the function, and also tried with the lazyeval package. No succes.

What am I doing wrong here ? How to get dcast to recognize variable names?

Community
  • 1
  • 1
user3375672
  • 3,728
  • 9
  • 41
  • 70

1 Answers1

13

Formula argument also accepts character input.

foo <- function(df, id, measure, val) {
    dcast(df, paste(paste(id, collapse = " + "), "~", 
                    paste(measure, collapse = " + ")), 
          value.var = val)
}

require(reshape2)
foo(df_l, "a", "v", "m1")

Note that data.table's dcast (current development) can also cast multiple value.var columns directly. So, you can also do:

require(data.table) # v1.9.5
foo(setDT(df_l), "a", "v", c("m1", "m2"))
#    a m1_var1 m1_var2 m2_var1 m2_var2
# 1: F       1       6      28      21
# 2: H       9       2      38      29
# 3: M       5      10      24      35
# 4: O       8       3      23      26
# 5: T       4       7      31      39
Arun
  • 116,683
  • 26
  • 284
  • 387
  • 1
    Two very good answers - I will chew further on the setDT function, seems awfully useful. – user3375672 Jul 08 '15 at 14:37
  • On the side: why does my line not work: dcast(df, row~col, value.var=val) ? – user3375672 Jul 08 '15 at 15:02
  • 1
    The error message from `reshape2::dcast` is quite cryptic. It's because `row ~ col` results in `dcast` looking for columns named `row` and `col` in your `df`. Install devel version of dt, load it and run `dcast(setDT(df), row ~ col, value.var="m1")`, and the error message should be quite clear. – Arun Jul 08 '15 at 15:05
  • multiple variable option is really useful! – JelenaČuklina May 10 '18 at 19:05