How to convert all column data type to numeric and character dynamically?

Question

I convert my columns data type manually:

data[,'particles'] <- as.numeric(as.character(data[,'particles']))

This not ideal as the data may evolve and I won't be sure what species coming, for instance they could be - "nox", "no2", "co", "so2", "pm10" and more in the future.

Is there anyway to convert them automatically?

My current dataset:

structure(list(particles = structure(c(1L, 3L, 5L, 5L, 5L, 5L, 
    5L, 5L, 5L, 5L, 5L, 6L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 4L, 4L, 
    4L, 3L, 3L, 3L, 3L, 5L, 6L, 5L, 3L), .Label = c("1", "11", "1.1", 
    "2", "2.1", "3.1"), class = "factor"), humidity = structure(c(4L, 
    7L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 1L, 1L, 1L, 
    5L, NA, NA, NA, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("0.1", 
    "1", "1.1", "1.3", "21", "2.1", "3"), class = "factor"), timestamp = c(1468833354929, 
    1468833365186, 1468833378458, 1468833538213, 1468833538416, 1468833538613, 
    1468833538810, 1468833538986, 1468833539172, 1468833539358, 1468833539539, 
    1468833554592, 1468833559059, 1468833562357, 1468833566225, 1468833573486, 
    1468840019118, 1468840024950, 1469029568849, 1469029584243, 1469029590530, 
    1469029622391, 1469029623598, 1469245154003, 1469245156533, 1469245156815, 
    1469245157123, 1469245162358, 1469245165911, 1469245170178, 1469245173788
    ), date = structure(c(1468833354.929, 1468833365.186, 1468833378.458, 
    1468833538.213, 1468833538.416, 1468833538.613, 1468833538.81, 
    1468833538.986, 1468833539.172, 1468833539.358, 1468833539.539, 
    1468833554.592, 1468833559.059, 1468833562.357, 1468833566.225, 
    1468833573.486, 1468840019.118, 1468840024.95, 1469029568.849, 
    1469029584.243, 1469029590.53, 1469029622.391, 1469029623.598, 
    1469245154.003, 1469245156.533, 1469245156.815, 1469245157.123, 
    1469245162.358, 1469245165.911, 1469245170.178, 1469245173.788
    ), class = c("POSIXct", "POSIXt"), tzone = "Asia/Singapore")), .Names = c("particles", 
    "humidity", "timestamp", "date"), row.names = c(NA, -31L), class = "data.frame")

It has particles, humidity, timestamp, date.

score 5 · Answer 1 · edited May 23 '17 at 12:33

5

Another option using mutate_if() from dplyr which allows you to operate on columns for which a predicate returns TRUE

library(dplyr)
df %>% 
  mutate_if(is.factor, funs(as.numeric(as.character(.))))

Note: This method will work for your follow up question as well

edited May 23 '17 at 12:33

Community

1
1

answered Jul 23 '16 at 13:12

Steven Beaupré

21,343
7
57
77

5

This solution is so elegant it should wear a top hat and ride in a Rolls Royce. – Minnow Aug 24 '16 at 19:21

Jaap · Accepted Answer · 2016-11-17T21:24:30.807

If you don't know which columns need to be converted beforehand, you can extract that info from your dataframe as follows:

vec <- sapply(dat, is.factor)

which gives:

> vec
particles  humidity timestamp      date 
     TRUE      TRUE     FALSE     FALSE

You can then use this vector to do the conversion on the subset with lapply:

# notation option one:
dat[, vec] <- lapply(dat[, vec], function(x) as.numeric(as.character(x)))
# notation option two:
dat[vec] <- lapply(dat[vec], function(x) as.numeric(as.character(x)))

If you want to detect both factor and character columns, you can use:

sapply(dat, function(x) is.factor(x)|is.character(x))

score 2 · Answer 3 · answered Jul 23 '16 at 14:37

2

We can use data.table

library(data.table) 
setDT(df)[, lapply(.SD, function(x) if(is.factor(x)) as.numeric(as.character(x)) else x)]

answered Jul 23 '16 at 14:37

akrun

874,273
37
540
662

score 1 · Answer 4 · answered Jul 23 '16 at 09:21

1

The best option is I think apply

You can do

newD<-apply(data[,"names"], 2,function(x) as.numeric(as.character(x)))

where in "names" you put all the variables you want. Then apply with 2 as second argument will apply the function(x) on all the columns(if you put 1 its by rows) of the first argument. And you can save it as new dataset or rewrite the old one with

data[,"names"]<-apply....

answered Jul 23 '16 at 09:21

Jan Sila

1,554
3
17
36

1

It means for the function(x) to be apply for each column. for example colMeans(data) is equivalent to apply(data,2, mean) and for rowMeans(data) you can use equivalently apply(data,1,mean). Do you see the difference? – Jan Sila Jul 23 '16 at 09:33
1

@teelou It's `MARGIN` argument. `2` means function will be applied over columns of the data frame. If you want to apply a function over rows, you'll write 1. – narendra-choudhary Jul 23 '16 at 09:33
@Narendra and Jan for the explanation. – Run Jul 23 '16 at 09:38
2

`apply` is not the best option. It is designed for use on matrices and will turn this data frame into a matrix. – Rich Scriven Aug 25 '16 at 18:45

score 1 · Answer 5 · answered Jul 23 '16 at 09:32

1

Use lapply:

cols <- c("particles", "nox", ...)

data[,cols] <- lapply(data[,cols], function(x) as.numeric(as.character(x)))

answered Jul 23 '16 at 09:32

Sumedh

4,835
2
17
32

How to convert all column data type to numeric and character dynamically?

5 Answers5

Linked

Related