0

Suppose i have a following db:

db<-data.frame(para=c(round(rnorm(20,10,10),0)),sal1=c(rnorm(20,100,7)),sal2=c(rnorm(20,100,7)),sal3=c(rnorm(10,100,7)),sal4=rep(c("a","b"),5))

   para      sal1      sal2      sal3 sal4
1    -3  89.72090 105.79164 101.09462    a
2     3 102.64036 104.07501  96.41335    b
3    11 104.65196  90.49886 101.81897    a
4    27  99.61455 102.23207 108.41161    b
5    24 101.18734  98.16081 103.04760    a

and i want only sal1,sal2,sal3 as numeric and rest as is. It should be generalised as i have 118 columns that i want as numeric and want to keep the rest as is.

I tried:

check<-names(db)
db<-db[as.numeric(get(check[which(check=="sal1"):(which(check=="sal1")+2)]))]

But i think this is just a shot in the dark.

Abhijeet Arora
  • 237
  • 3
  • 13

1 Answers1

3

We can use grep to select the columns that start with 'sal', use that index to subset the 'db', loop through the columns, and convert to numeric assign the output to the 'db[nm1]`

 nm1 <- grep("^sal\\d+", names(db))
 db[nm1] <- lapply(db[nm1], as.numeric)

If we need it in data.table, convert the 'data.frame' to 'data.table' (setDT(db)), specify the columns in .SDcols, loop through the Subset of Data.table (.SD), convert to numeric and assign (:=) it back to same column names.

library(data.table)
setDT(db)[, (nm1) := lapply(.SD, as.numeric), .SDcols = nm1]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thanks a lot.. this serves my purpose. – Abhijeet Arora Aug 08 '16 at 13:00
  • Hey when i converted using this "setDT(db)[, (nm1) := lapply(.SD, as.numeric), .SDcols = nm1]". since it was a factor, my 0's got converted into 1 and 1's got converted as 2. Any workaround for that? A quick help will be appreciated. – Abhijeet Arora Aug 09 '16 at 12:20
  • 1
    @AbhijeetArora Okay in that case, you have to first convert to `character` and then to `numeric` i.e. `setDT(db)[, (nm1) := lapply(.SD, function(x) as.numeric(as.character(x))), .SDcols = nm1]` – akrun Aug 09 '16 at 12:22