Use R function to replaces strings as factor in the same dataframe

Question

I would like to transform certain columns with a specific string code to a factor in the same data.frame. However, I am stymied by the initial task of passing the data.frame column reference to my function. Working from examples here and its linked pages, I believe the following should work:

#feed string to function

set.seed(42)
df <- data.frame(
chr1 = sample(letters[1:4], 10, T),
chr2 = sample(letters[4:7], 10, T), 
stringsAsFactors = F
)

tofactor <- function(dat,column) {
  dat[,column] <- as.factor(dat[,column])
}

tofactor(df, "chr1")
typeof(df$chr1)

However, the result of this operation is persistence of string encoding for df$chr1. I have also tried a reference using a double square brackets approach without success.

Thanks for your assistance.

Previous poster is incorrect; your function is fine. Your problem is that you don't assign the result of the `tofactor` function to anything. Use `df$chr1 <- tofactor(df, "chr1")`. Also, use `=` and not `<-` inside your call to `data.frame()`. — cmaher, Jul 11 '17 at 21:37
Why doesn't the single line of the function accomplish the intended replacement? — Todd D, Jul 11 '17 at 21:47
it _does_ do the replacement, but your function is returning the value that is returned by `\`[<-\`` which, in this case, is just the result of `as.factor(dat[,column])`. to get the object that contains the replacement, you need to return `dat` instead. if your goal was simply to get the vector back, then I would tend to agree more with previous poster and call myself incorrect — rawr, Jul 11 '17 at 22:01
@Todd You should spend some time to study scoping in R. Changes inside a function (usually, with few special exceptions that you normally should avoid) don't affect objects outside a function. You should return the changed data.frame and assign it to the original data.frame. — Roland, Jul 11 '17 at 22:05
My plan was to place the column names in a vector and then `apply()` for each value in the vector. Based on these comments, it would seem this strategy may not work. — Todd D, Jul 11 '17 at 22:27

score 0 · Answer 1 · answered Jul 11 '17 at 22:48

0

The function is working fine, all you need to do is assign the output to the original (or a new df).

df <- tofactor(df, "chr1")

If you run str(tofactor(df,"chr1")) you get the return:

Factor w/ 4 levels "a","b","c","d": 4 4 2 4 3 3 3 1 3 3

answered Jul 11 '17 at 22:48

Mako212

6,787
1
18
37

score 0 · Accepted Answer · answered Jul 11 '17 at 22:52

Another way is to use mutate_at and specify the variables inside of var:

library(dplyr)

df <- data.frame(
  chr1 = sample(letters[1:4], 10, T),
  chr2 = sample(letters[4:7], 10, T), 
  stringsAsFactors = F
)

df2 <- df %>% 
 mutate_at(vars(chr1), as.factor)

class(df2$chr1) #[1] "factor"

score 0 · Answer 3 · answered Jul 17 '17 at 22:36

After understanding scope better and direction to assign() from a colleague, I've arrived at:

set.seed(42)
df <- data.frame(
  chr1 = sample(letters[1:4], 10, T),
  chr2 = sample(letters[4:7], 10, T), 
  stringsAsFactors = F
)

tofactor <- function(dat,column) {
  dat[,column] <- as.factor(dat[,column])
  assign("df",dat, envir = .GlobalEnv)
}

tofactor(df, "chr1")
typeof(df$chr1)

This solution handles the replacement in the function, which allows for repeated use without having to assign the output in an additional step.

Use R function to replaces strings as factor in the same dataframe

3 Answers3