0

I would like to transform certain columns with a specific string code to a factor in the same data.frame. However, I am stymied by the initial task of passing the data.frame column reference to my function. Working from examples here and its linked pages, I believe the following should work:

#feed string to function

set.seed(42)
df <- data.frame(
chr1 = sample(letters[1:4], 10, T),
chr2 = sample(letters[4:7], 10, T), 
stringsAsFactors = F
)

tofactor <- function(dat,column) {
  dat[,column] <- as.factor(dat[,column])
}

tofactor(df, "chr1")
typeof(df$chr1)

However, the result of this operation is persistence of string encoding for df$chr1. I have also tried a reference using a double square brackets approach without success.

Thanks for your assistance.

rawr
  • 20,481
  • 4
  • 44
  • 78
Todd D
  • 258
  • 1
  • 13
  • add `dat` as the final line in your function – rawr Jul 11 '17 at 21:34
  • Previous poster is incorrect; your function is fine. Your problem is that you don't assign the result of the `tofactor` function to anything. Use `df$chr1 <- tofactor(df, "chr1")`. Also, use `=` and not `<-` inside your call to `data.frame()`. – cmaher Jul 11 '17 at 21:37
  • 2
    previous poster's advice is silly; my way is better – rawr Jul 11 '17 at 21:41
  • Why doesn't the single line of the function accomplish the intended replacement? – Todd D Jul 11 '17 at 21:47
  • it _does_ do the replacement, but your function is returning the value that is returned by `\`[<-\`` which, in this case, is just the result of `as.factor(dat[,column])`. to get the object that contains the replacement, you need to return `dat` instead. if your goal was simply to get the vector back, then I would tend to agree more with previous poster and call myself incorrect – rawr Jul 11 '17 at 22:01
  • @Todd You should spend some time to study scoping in R. Changes inside a function (usually, with few special exceptions that you normally should avoid) don't affect objects outside a function. You should return the changed data.frame and assign it to the original data.frame. – Roland Jul 11 '17 at 22:05
  • My plan was to place the column names in a vector and then `apply()` for each value in the vector. Based on these comments, it would seem this strategy may not work. – Todd D Jul 11 '17 at 22:27

3 Answers3

0

The function is working fine, all you need to do is assign the output to the original (or a new df).

df <- tofactor(df, "chr1")

If you run str(tofactor(df,"chr1")) you get the return:

Factor w/ 4 levels "a","b","c","d": 4 4 2 4 3 3 3 1 3 3

Mako212
  • 6,787
  • 1
  • 18
  • 37
0

Another way is to use mutate_at and specify the variables inside of var:

library(dplyr)

df <- data.frame(
  chr1 = sample(letters[1:4], 10, T),
  chr2 = sample(letters[4:7], 10, T), 
  stringsAsFactors = F
)

df2 <- df %>% 
 mutate_at(vars(chr1), as.factor)

class(df2$chr1) #[1] "factor"
roarkz
  • 811
  • 10
  • 22
0

After understanding scope better and direction to assign() from a colleague, I've arrived at:

set.seed(42)
df <- data.frame(
  chr1 = sample(letters[1:4], 10, T),
  chr2 = sample(letters[4:7], 10, T), 
  stringsAsFactors = F
)

tofactor <- function(dat,column) {
  dat[,column] <- as.factor(dat[,column])
  assign("df",dat, envir = .GlobalEnv)
}

tofactor(df, "chr1")
typeof(df$chr1)

This solution handles the replacement in the function, which allows for repeated use without having to assign the output in an additional step.

Todd D
  • 258
  • 1
  • 13