1

Using this dummy dataset

setDT(mtcars_copy<-copy(mtcars))
new_col<- "sum_carb" # for dynamic column referencing

Why does Case 1 work but not Case 2?

# Case 1 - Works fine
mtcars_copy[,eval(new_col):=sum(carb)] # Works fine


# Case 2:Doesnt work
aggregate_mtcars<-mtcars_copy[,(eval(new_col)=sum(carb))] # error
aggregate_mtcars<-mtcars_copy[,eval(new_col)=sum(carb))] # error
aggregate_mtcars<-mtcars_copy[,c(eval(new_col)=sum(carb))] # Error

How does one get Case 2 to work wherein I dont want the main table (mtcars_copy in this case to hold the new columns) but for the results to be stored in a separate aggregation table (aggregate_mtcars)

jogo
  • 12,469
  • 11
  • 37
  • 42
ashleych
  • 1,042
  • 8
  • 25

3 Answers3

2

I think what you want is to simply make a copy when doing case 1.

aggregate_mtcars <- copy(mtcars_copy)[, eval(new_col) := sum(carb)]

That retains mtcars_copy as a separate dataset to the new aggregate_metcars, without the new columns.

Jaccar
  • 1,720
  • 17
  • 46
2

One option is to use the base R function setNames

aggregate_mtcars <- mtcars_copy[, setNames(.(sum(carb)), new_col)]

Or you could use data.table::setnames

aggregate_mtcars <- setnames(mtcars_copy[, .(sum(carb))], new_col)
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • Yeah, but i am wondering if the rename/setname option can be avoided. My original problem is more complicated, and setnames would just make it very messy – ashleych Jun 20 '19 at 12:47
  • @ashleych In general, you cannot dynamically assign/name like `make_name(x,y) = z` in R, unfortunately. Inside a data.table DT[...] with `:=` is just an exception to that rule that the package designers implemented. – Frank Jun 20 '19 at 13:46
  • looks like this indeed is the most efficient way, then. – ashleych Jun 20 '19 at 13:48
1

The reason is because case 2 uses data.frame way to create column in a data frame (as a new list). There is hidden parameter in data.table : with that handles the way the object is returned. It can be a data.table or a vector.

?data.table :
By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. In case of overlapping variables names inside dataset and in parent scope you can use double dot prefix ..cols to explicitly refer to 'cols variable parent scope and not from your dataset.

When j is a character vector of column names, a numeric vector of column positions to select or of the form startcol:endcol, and the value returned is always a data.table. with=FALSE is not necessary anymore to select columns dynamically. Note that x[, cols] is equivalent to x[, ..cols] and to x[, cols, with=FALSE] and to x[, .SD, .SDcols=cols].

# Case 2 :
aggregate_mtcars<-mtcars_copy[,(get(new_col)=sum(carb))] # error
aggregate_mtcars<-mtcars_copy[,eval(new_col)=sum(carb))] # error
aggregate_mtcars<-mtcars_copy[,c(eval(new_col)=sum(carb))] # Error

mtcars_copy[, new_col, with = FALSE ] # gives a data.table
mtcars_copy[, eval(new_col), with = FALSE ] # this works and create a data.table
mtcars_copy[, eval(new_col), with = TRUE ] # the default that is used here with error
mtcars_copy[, get(new_col), with = TRUE ] # works and gives a vector

# Case 2 solution : affecting values the data.frame way
mtcars_copy[, eval(new_col) ] <- sum(mtcars_copy$carb) # or any vector
mtcars_copy[[eval(new_col)]] <- sum(mtcars_copy$carb) # or any vector
cbo
  • 1,664
  • 1
  • 12
  • 27