6

I wish to change the class of selected variables in a data table, using a vectorized operation. I am new to the data.table syntax, and am trying to learn as much as possible. I now the question is basic, but it will help me to better understand the data table way of thinking!

A similar question was asked here! However, the solution seems to pertain to either reclassing just one column or all columns. My question is unique to a select few columns.

### Load package
require(data.table)

### Create pseudo data
data <- data.table(id     = 1:10,
                   height = rnorm(10, mean = 182, sd = 20),
                   weight = rnorm(10, mean = 160, sd = 10),
                   color  = rep(c('blue', 'gold'), times = 5))

### Reclass all columns
data <- data[, lapply(.SD, as.character)]

### Search for columns to be reclassed
index <- grep('(id)|(height)|(weight)', names(data))

### data frame method
df <- data.frame(data)
df[, index] <- lapply(df[, index], as.numeric)

### Failed attempt to reclass columns used the data.table method
data <- data[, lapply(index, as.character), with = F]

Any help would be appreciated. My data are large and so using regular expressions to create a vector of column numbers to reclassify is necessary.

Thank you for your time.

Community
  • 1
  • 1
Andreas
  • 1,923
  • 19
  • 24

3 Answers3

9

You could avoid the overhead of the construction of .SD within j by using set

for(j in index) set(data, j =j ,value = as.character(data[[j]]))
mnel
  • 113,303
  • 27
  • 265
  • 254
  • 2
    (+1) This is fast/efficient on 2 accounts: 1) no .SD and 2) using `set` instead of `:=` (the latter of which has the `[.data.table` overhead). Brilliant! – Arun Apr 25 '13 at 22:48
8

I think that @SimonO101 did most of the Job

data[, names(data)[index] := lapply(.SD, as.character) , .SDcols = index ]

You can just use the := magic

dickoa
  • 18,217
  • 3
  • 36
  • 50
  • +1 that's it!! Ok, since my answer is incorrect I'm going to delete it. – Simon O'Hanlon Apr 25 '13 at 21:53
  • 2
    (+1) you can directly pass `index` as well. `data[, c(index) := lapply(.SD, as.character) , .SDcols = index ]` – Arun Apr 25 '13 at 22:34
  • Alternatively, without using `.SDcols`: `data[, c(index) := lapply(data[, 1:3, with=FALSE], as.character)]` – Arun Apr 25 '13 at 22:38
  • strangely, i copy/pasted this solution, which ran successfully, but didn't actually change the named columns in my `index`... @mnel's answer below did the trick for me. – NiuBiBang Jul 14 '15 at 14:18
4

You just need to use .SDcols with your index vector (I learnt that today!), but that will just return a data table with the reclassed columns. @dickoa 's answer is what you are looking for.

data <- data[, lapply(.SD, as.character) , .SDcols = index ]
sapply(data , class)
        id      height      weight 
"character" "character" "character" 
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • This creates a new data table with only the "index" columns. How can I change the class of the "index" columns, and keep the rest of the data.table in tact? I can easily see how to do this using a merge or cbind, but there has got to be a more elegant way! – Andreas Apr 25 '13 at 21:47
  • Argggh. You are right. I know this, but I also have trouble with the syntax. There is an easy way - trying to remember the correct syntax!! – Simon O'Hanlon Apr 25 '13 at 21:51
  • This is important to note. Thank you for editing your response to reflect what we learned! – Andreas Apr 25 '13 at 22:01