Pass user input to 'by' in data.table and the formula in reshape - r

Question

Below is an example of what I want to do. eval(substitute(*)) works well, as shown here, but makes the code a bit harder to read. I was wondering if there is anything better I am not aware of.

I want to be able to choose the row and column variables of the table (at the end). So, if I have

input.col <- 'Gender'
input.row <- 'Region'

I want to be able to pass these arguments to the data table instead of Region and Gender being used as below.

library(data.table)
library(reshape)
set.seed(5)
DT <- data.table(Region = sample(x = c('Asia', 'Americas', 'Africa', 'Europe', 'Oceania'), size = 200, replace = T), Weight = runif(n = 200, min = 1, max = 5), Age = round(x = 10*rexp(n = 200, rate = 1), digits = 0), Gender = sample(x = c('Male', 'Female', 'Gender diverse'), size = 200, replace = T, prob = c(0.49, 0.49, 0.02)))
cast(data = DT[, sum(Weight), .(Region, Gender)], formula = Region~Gender, fun.aggregate = sum, value = 'V1')

I want to get to the following table

Region   Female Gender diverse     Male
1   Africa 32.95019       3.222125 77.50863
2 Americas 49.12787       0.000000 84.97214
3     Asia 41.04879       0.000000 55.43294
4   Europe 45.39469       4.296767 47.76714
5  Oceania 65.89198       1.439075 72.27496

Thank you!

G. Grothendieck · Accepted Answer · 2017-07-24T00:45:39.507

Here are some possibilities. Except for (3) they only use data.table. All the approaches perform aggregation and reshaping in one operation so there is no need to use by in the first place. If you really did want to use by for some reason anyways then this would work:

cast(data = DT[, sum(Weight), by = c(input.row, input.col)], 
     formula = paste(input.row, "~", input.col), fun.aggregate = sum, value = 'V1')

1) data.table::dcast

dcast(DT, paste(input.row, "~", input.col), sum, value.var = "Weight")

giving:

     Region   Female Gender diverse     Male
1:   Africa 32.95019       3.222125 77.50863
2: Americas 49.12787       0.000000 84.97214
3:     Asia 41.04879       0.000000 55.43294
4:   Europe 45.39469       4.296767 47.76714
5:  Oceania 65.89198       1.439075 72.27496

2) xtabs xtabs is in the base of R:

fo <- sprintf("Weight ~ %s + %s", input.row, input.col)
xtabs(fo, DT)

giving:

          Gender
Region        Female Gender diverse      Male
  Africa   32.950187       3.222125 77.508626
  Americas 49.127873       0.000000 84.972137
  Asia     41.048787       0.000000 55.432941
  Europe   45.394693       4.296767 47.767138
  Oceania  65.891983       1.439075 72.274955

3) reshape::cast We will use the reshape package because the question does but note that it has been superceded by the reshape2 package and in reshape2 one would use dcast; however, dcast is also implemented in data.table as per (1).

cast(DT, paste(input.row, "~", input.col), sum, value = "Weight")

giving:

    Region   Female Gender diverse     Male
1   Africa 32.95019       3.222125 77.50863
2 Americas 49.12787       0.000000 84.97214
3     Asia 41.04879       0.000000 55.43294
4   Europe 45.39469       4.296767 47.76714
5  Oceania 65.89198       1.439075 72.27496

4) tapply

tapply(DT$Weight, as.list(DT)[c(input.row, input.col)], sum, default = 0)

giving:

          Gender
Region       Female Gender diverse     Male
  Africa   32.95019       3.222125 77.50863
  Americas 49.12787       0.000000 84.97214
  Asia     41.04879       0.000000 55.43294
  Europe   45.39469       4.296767 47.76714
  Oceania  65.89198       1.439075 72.27496

This is a great list. It's length is also proportional to my ignorance! Thank you. — Ameya, Jul 24 '17 at 00:34

score 3 · Answer 2 · edited Jul 23 '17 at 23:43

You can use get and then rename the variables which can be further used in the formula:

input.col <- 'Gender'
input.row <- 'Region'

dt <- cast(data = DT[, sum(Weight), .(row = get(input.row), col = get(input.col))], 
#                                     ^^^   ^^^             ^^^   ^^^  
           formula = row ~ col, fun.aggregate = sum, value = 'V1')

dt
#       row   Female Gender diverse     Male
#1   Africa 32.95019       3.222125 77.50863
#2 Americas 49.12787       0.000000 84.97214
#3     Asia 41.04879       0.000000 55.43294
#4   Europe 45.39469       4.296767 47.76714
#5  Oceania 65.89198       1.439075 72.27496

Pass user input to 'by' in data.table and the formula in reshape - r

2 Answers2