2

I convert my columns data type manually:

data[,'particles'] <- as.numeric(as.character(data[,'particles']))

This not ideal as the data may evolve and I won't be sure what species coming, for instance they could be - "nox", "no2", "co", "so2", "pm10" and more in the future.

Is there anyway to convert them automatically?

My current dataset:

structure(list(particles = structure(c(1L, 3L, 5L, 5L, 5L, 5L, 
    5L, 5L, 5L, 5L, 5L, 6L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 4L, 4L, 
    4L, 3L, 3L, 3L, 3L, 5L, 6L, 5L, 3L), .Label = c("1", "11", "1.1", 
    "2", "2.1", "3.1"), class = "factor"), humidity = structure(c(4L, 
    7L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 1L, 1L, 1L, 
    5L, NA, NA, NA, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("0.1", 
    "1", "1.1", "1.3", "21", "2.1", "3"), class = "factor"), timestamp = c(1468833354929, 
    1468833365186, 1468833378458, 1468833538213, 1468833538416, 1468833538613, 
    1468833538810, 1468833538986, 1468833539172, 1468833539358, 1468833539539, 
    1468833554592, 1468833559059, 1468833562357, 1468833566225, 1468833573486, 
    1468840019118, 1468840024950, 1469029568849, 1469029584243, 1469029590530, 
    1469029622391, 1469029623598, 1469245154003, 1469245156533, 1469245156815, 
    1469245157123, 1469245162358, 1469245165911, 1469245170178, 1469245173788
    ), date = structure(c(1468833354.929, 1468833365.186, 1468833378.458, 
    1468833538.213, 1468833538.416, 1468833538.613, 1468833538.81, 
    1468833538.986, 1468833539.172, 1468833539.358, 1468833539.539, 
    1468833554.592, 1468833559.059, 1468833562.357, 1468833566.225, 
    1468833573.486, 1468840019.118, 1468840024.95, 1469029568.849, 
    1469029584.243, 1469029590.53, 1469029622.391, 1469029623.598, 
    1469245154.003, 1469245156.533, 1469245156.815, 1469245157.123, 
    1469245162.358, 1469245165.911, 1469245170.178, 1469245173.788
    ), class = c("POSIXct", "POSIXt"), tzone = "Asia/Singapore")), .Names = c("particles", 
    "humidity", "timestamp", "date"), row.names = c(NA, -31L), class = "data.frame")

It has particles, humidity, timestamp, date.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Run
  • 54,938
  • 169
  • 450
  • 748

5 Answers5

5

Another option using mutate_if() from dplyr which allows you to operate on columns for which a predicate returns TRUE

library(dplyr)
df %>% 
  mutate_if(is.factor, funs(as.numeric(as.character(.))))

Note: This method will work for your follow up question as well

Community
  • 1
  • 1
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
4

If you don't know which columns need to be converted beforehand, you can extract that info from your dataframe as follows:

vec <- sapply(dat, is.factor)

which gives:

> vec
particles  humidity timestamp      date 
     TRUE      TRUE     FALSE     FALSE 

You can then use this vector to do the conversion on the subset with lapply:

# notation option one:
dat[, vec] <- lapply(dat[, vec], function(x) as.numeric(as.character(x)))
# notation option two:
dat[vec] <- lapply(dat[vec], function(x) as.numeric(as.character(x)))

If you want to detect both factor and character columns, you can use:

sapply(dat, function(x) is.factor(x)|is.character(x))
Jaap
  • 81,064
  • 34
  • 182
  • 193
2

We can use data.table

library(data.table) 
setDT(df)[, lapply(.SD, function(x) if(is.factor(x)) as.numeric(as.character(x)) else x)]
akrun
  • 874,273
  • 37
  • 540
  • 662
1

The best option is I think apply

You can do

newD<-apply(data[,"names"], 2,function(x) as.numeric(as.character(x)))

where in "names" you put all the variables you want. Then apply with 2 as second argument will apply the function(x) on all the columns(if you put 1 its by rows) of the first argument. And you can save it as new dataset or rewrite the old one with

data[,"names"]<-apply....
Jan Sila
  • 1,554
  • 3
  • 17
  • 36
  • 1
    It means for the function(x) to be apply for each column. for example colMeans(data) is equivalent to apply(data,2, mean) and for rowMeans(data) you can use equivalently apply(data,1,mean). Do you see the difference? – Jan Sila Jul 23 '16 at 09:33
  • 1
    @teelou It's `MARGIN` argument. `2` means function will be applied over columns of the data frame. If you want to apply a function over rows, you'll write 1. – narendra-choudhary Jul 23 '16 at 09:33
  • @Narendra and Jan for the explanation. – Run Jul 23 '16 at 09:38
  • 2
    `apply` is not the best option. It is designed for use on matrices and will turn this data frame into a matrix. – Rich Scriven Aug 25 '16 at 18:45
1

Use lapply:

cols <- c("particles", "nox", ...)

data[,cols] <- lapply(data[,cols], function(x) as.numeric(as.character(x)))
Sumedh
  • 4,835
  • 2
  • 17
  • 32