47

I'm trying to manipulate a number of data.tables in similar ways, and would like to write a function to accomplish this. I would like to pass in a parameter containing a list of columns that would have the operations performed. This works fine when the vector declaration of columns is the left hand side of the := operator, but not if it is declared earlier (or passed into the function). The follow code shows the issue.

dt = data.table(a = letters, b = 1:2, c=1:13)
colsToDelete = c('b', 'c')
dt[,colsToDelete := NULL] # doesn't work but I don't understand why not.
dt[,c('b', 'c') := NULL] # works fine, but doesn't allow passing in of columns

The error is "Adding new column 'colsToDelete' then assigning NULL (deleting it)." So clearly, it's interpreting 'colsToDelete' as a new column name.

The same issue occurs when doing something along these lines

dt[, colNames := lapply(.SD, adjustValue, y=factor), .SDcols = colNames]

I new to R, but rather more experienced with some other languages, so this may be a silly question.

user3704757
  • 571
  • 1
  • 4
  • 4

4 Answers4

67

It's basically because we allow symbols on LHS of := to add new columns, for convenience: ex: DT[, col := val]. So, in order to distinguish col itself being the name from whatever is stored in col being the column names, we check if the LHS is a name or an expression.

If it's a name, it adds the column with the name as such on the LHS, and if expression, then it gets evaluated.

DT[, col := val] # col is the column name.

DT[, (col) := val]  # col gets evaluated and replaced with its value
DT[, c(col) := val] # same as above

The preferred idiom is: dt[, (colsToDelete) := NULL]

HTH

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
Arun
  • 116,683
  • 26
  • 284
  • 387
21

I am surprised no answer provided uses the set() function.

set(DT, , colsToDelete, NULL)

This should be the easiest.

Feng Jiang
  • 1,776
  • 19
  • 25
14

To extend on previous answer, you can delete columns by reference doing:

# delete columns 10 to 15
dt[ , (10:15) := NULL ]

or

# delete columns 3, 5 and 10 to 15
dt[ , (c(3,5,10:15)) := NULL ]
rafa.pereira
  • 13,251
  • 6
  • 71
  • 109
0

This code did the job for me. you need to have the position of the columns to be deleted e.g., posvec as mentioned in the ?set

j: Column name(s) (character) or number(s) (integer) to be assigned value when column(s) already exist, and only column name(s) if they are to be created.

DT_removed_slected_col = set(DT, j = posvec, value = NULL)

Also if you want to get the posvec you can try this:

selected_col = c('col_a','col_b',...)

selected_col = unlist(sapply(selected_col, function(x) grep(x,names(DT)))) 

namvec = names(selected_col) #col names

posvec = unname(selected_col) #col positions

Sheykhmousa
  • 139
  • 9