Turn strings into categorical variables in R without specifying column names

Question

I have a dataframe named df with 70 character variables in. I am trying to create a function to turn all these character columns into categorical variables without having to specify each column name. An example of this is as such:

df
  fruits   cars 
1 apple    volvo
2 pear     bwm
3 apple    bwm
4 orange   volvo
5 orange   fiat

And my desired output looks as such:

df
  fruits   cars 
1 1        1
2 2        2
3 1        2
4 3        1
5 3        3

I have tried converting to factor and then specifying the levels which has worked when performing on a single column by not using apply. Here was my attempt:

x <- apply(df$fruit, 2, factor)
levels(x) <- 1:length(levels(x))

Failing when in a function

label_num <- function(x){
assigned <- 1:length(levels(x))
return(assigned)
}
x <- apply(df, 2, factor)
apply(levels(x), 2, label_num)

I receive the following error:

Error in apply(levels(x), 2, label_num) : 
  dim(X) must have a positive length

Can someone help me solve this please as am very new to R. Many thanks.

score 3 · Accepted Answer · answered Jul 22 '20 at 15:40

I suggest looking into the dplyr package. You can do this pretty quickly with mutate_if

df <- data.frame(
  fruits = c('apple', 'pear', 'apple', 'orange', 'orange'),
  cars = c('volvo', 'bwm', 'bmw', 'volvo', 'fiat'),
  stringsAsFactors = FALSE
)

str(df)

'data.frame':   5 obs. of  2 variables:
 $ fruits: chr  "apple" "pear" "apple" "orange" ...
 $ cars  : chr  "volvo" "bwm" "bmw" "volvo" ...

library(dplyr)
dfFactors <- df %>% 
  mutate_if(is.character, as.factor)

str(dfFactors)

'data.frame':   5 obs. of  2 variables:
 $ fruits: Factor w/ 3 levels "apple","orange",..: 1 3 1 2 2
 $ cars  : Factor w/ 4 levels "bmw","bwm","fiat",..: 4 2 1 4 3

`mutate_if` has been superseded by the use of `across()` in dplyr 1.0.0. So use `mutate(across(is.character, as.factor))` instead. — Martin Gal, Jul 22 '20 at 15:44

score 1 · Answer 2 · answered Jul 22 '20 at 15:41

Try this base R solution:

#Data
df <- structure(list(fruits = c("apple", "pear", "apple", "orange", 
"orange"), cars = c("volvo", "bwm", "bwm", "volvo", "fiat")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

#Code
as.data.frame(apply(df,2,function(x) {x<-as.numeric(factor(x,levels = unique(x)))}))

It will produce:

  fruits cars
1      1    1
2      2    2
3      1    2
4      3    1
5      3    3

score 0 · Answer 3 · answered Jul 22 '20 at 15:42

A base R solution:

df <- read.table(text="  fruits   cars 
apple    volvo
pear     bwm
apple    bwm
orange   volvo
orange   fiat", header=TRUE, stringsAsFactors=FALSE)

x <- as.data.frame(lapply(df, function(x) factor(x, labels = seq_along(unique(x)))))
x
#  fruits cars
#1      1    3
#2      3    1
#3      1    1
#4      2    3
#5      2    2

Turn strings into categorical variables in R without specifying column names

3 Answers3