1

I have a dataframe

division | category
  A      |    tools
  A      |    work
  B      |    tools
  B      |    TOOLS

both columns are factor variables. how do I convert TOOLS to tools?

I tried

df$category <- as.character(df$category)
df$category <- lapply(df$category, function(x) { tolower(x) } )
df$category <- as.factor(df$category)

but then for the last command I get:

Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

What does that mean?

KillerSnail
  • 3,321
  • 11
  • 46
  • 64

2 Answers2

3

The error means that you've tried to factor a list, although not in those words. It is triggered because you used lapply(), which returns a list. And in this situation as.factor() calls factor(), which in turn calls sort.list() here:

## from factor()
if (missing(levels)) {
    y <- unique(x, nmax = nmax)
    ind <- sort.list(y)
    ...
}

which is where the error occurs.

as.factor(list(1, 2))
# Error in sort.list(y) : 'x' must be atomic for 'sort.list'
# Have you called 'sort' on a list?

Long story short, you can use tolower() without lapply(), as it is vectorized and does the character coercion for you.

df$category <- factor(tolower(df$category))
df
#   division category
# 1        A    tools
# 2        A     work
# 3        B    tools
# 4        B    tools
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • 1
    Also, a more specific solution for `TOOLS` could be `factor(gsub("TOOLS", "tools", df$category, fixed = TRUE))` – David Arenburg Aug 30 '15 at 17:36
  • And also `df$category[df$category == "TOOLS"] <- "tools"` as well, and then drop the unused level – Rich Scriven Aug 30 '15 at 17:50
  • See also [this](http://stackoverflow.com/questions/28181753/grouping-factor-levels-in-an-r-data-table) question of mine or [this](http://stackoverflow.com/questions/19410108/cleaning-up-factor-levels-collapsing-multiple-levels-labels) related question for how to group factors more generally – MichaelChirico Aug 30 '15 at 18:15
2

I don't think you need to use lapply. This worked for me.

division=c("a","a","b","b")
category=c("tools","work","tools","TOOLS")

df=data.frame(division,category)
df$category=tolower(df$category)

> as.factor(df$category)
[1] tools work  tools tools
RayVelcoro
  • 524
  • 6
  • 21