0

I'm using R and I have a dataset with about 450 columns and I'm trying to figure out how to loop through all the columns and then if that column's values are categorical then recode that column's values.

attach(my_data)
for(i in names(my_data)){
    # how to check the data of each column
    my_data[[my_data[[i]]]] <- as.numeric(my_data[[i]])
}

That's what I've been able to work out so far, but I'm not sure how to check the data of each column.

user2743
  • 1,423
  • 3
  • 22
  • 34
  • If you mean you have numeric factors, you need to convert to character before you convert to numeric, e.g. `library(dplyr) ; my_data %>% mutate_if(is.factor, funs(as.numeric(as.character(.))))` – alistaire Jul 10 '16 at 01:13
  • 1
    You can check column type with `class(my_data[,i])`. See [this answer](http://stackoverflow.com/questions/21125222/determine-the-data-types-of-an-r-data-frames-columns) for other vectorized approaches. – pbee Jul 10 '16 at 02:32

2 Answers2

2

We can also do with lapply

my_data[] <- lapply(my_data, function(x) if(is.factor(x))
                       as.numeric(as.character(x)) else x)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Before I run the code a few of the columns are typed as factor. After I run the code all those columns are typed as num, but all the data in those columns are all NAs. – user2743 Jul 11 '16 at 02:33
  • @user2743 As didn't provide any example, a guess would be that there would be some non-numeric element in that columns, which would convert to NA when coerced to numeric. – akrun Jul 11 '16 at 04:11
1

You should precompute which columns are factors are then iterate through only those columns:

str(my_data);
## 'data.frame': 3 obs. of  4 variables:
##  $ V1: int  1 2 3
##  $ V2: Factor w/ 3 levels "4","5","6": 1 2 3
##  $ V3: chr  "a" "b" "c"
##  $ V4: Factor w/ 3 levels "7","8","9": 1 2 3
for (i in which(sapply(my_data,is.factor)))
    my_data[[i]] <- as.numeric(as.character(my_data[[i]]));
str(my_data);
## 'data.frame': 3 obs. of  4 variables:
##  $ V1: int  1 2 3
##  $ V2: num  4 5 6
##  $ V3: chr  "a" "b" "c"
##  $ V4: num  7 8 9

Data

my_data <- data.frame(V1=1:3,V2=factor(4:6),V3=letters[1:3],V4=factor(7:9),stringsAsFactors=F
);
bgoldst
  • 34,190
  • 6
  • 38
  • 64