0

I’m new to R and I’m trying to learn fundamentals. I’m facing a small problem for which I can’t find a solution online.

What I want to do: write a function to lower case for all the columns in my data.frame if they respect a condition (class = factor)

This code works, but for all my columns :

lower = function(x) { data.frame(tolower(as.matrix(x))) }

I need more something like this, but it doesn’t work:

lower = function (x) { 
  for (i in 1:length(x)) {
    if (class(i)=="factor") {
         data.frame(tolower(as.matrix(x))) 
    }
  }
}

x is my data.frame.

demongolem
  • 9,474
  • 36
  • 90
  • 105
Y.P
  • 355
  • 2
  • 12
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269) . This will make it much easier for others to help you. – Jaap Oct 31 '16 at 15:09

2 Answers2

4

Your attempt is close, but the returning is the tricky part. Since there is no sample data, here is an example using iris and changing to upper case instead:

as.data.frame(lapply(head(iris), function(x){
  if(class(x) == "factor"){
    return(toupper(x))
  } else{
    return(x)
  }
}))

lapply is an efficient way to loop through list data (and a data.frame is inherently a list). as.data.frame is necessary to convert back to a data.frame.

However, there are even better tools available to avoid writing them yourself, including mutate_if from dplyr:

head(iris) %>%
  mutate_if(is.factor, toupper)

Both of these return:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  SETOSA
2          4.9         3.0          1.4         0.2  SETOSA
3          4.7         3.2          1.3         0.2  SETOSA
4          4.6         3.1          1.5         0.2  SETOSA
5          5.0         3.6          1.4         0.2  SETOSA
6          5.4         3.9          1.7         0.4  SETOSA
Mark Peterson
  • 9,370
  • 2
  • 25
  • 48
  • Thank you very much Mark !! Your answer is clear and simple :) – Y.P Oct 31 '16 at 16:13
  • The value returned by `toupper` or `tolower` is always a character vector, regardless of what you run it through. – Mark Peterson Oct 31 '16 at 16:16
  • Yes Mark, you should be aware of the side effects of any manipulations. – Pierre L Oct 31 '16 at 16:20
  • No argument there. Your answer does return a factor, though OP's initial code did not, and that was not a stated requirement. My non-dplyr solution only returns a factor due to the default of `stringsAsFactor` and the levels will not (necessarily) match up to the intial levels. The question, more broadly, is about applying functions to subset of columns; the behavior of that function is a separate (important) issue – Mark Peterson Oct 31 '16 at 16:27
1

Note that the solutions to the question below return factor columns as factor and do not re-order the factor levels.

The approach in the question could re-order factor levels because that code changes the factor columns to character and then back to factor using default ordering (which may not have been the initial ordering). For example, this shows what can happen if tolower is applied directly to a factor:

fac <- factor(c("One", "None", "Two"), levels = c("One", "Two", "None"))

fac
## [1] One  None Two 
## Levels: One Two None

factor(tolower(fac)) # order of levels has changed!
## [1] one  none two 
## Levels: none one two

In particular, that implies that sort(fac) does not correspond to the order in sort(factor(tolower(fac))) .

We now discuss some alternative solutions to the question which do not have the reordering problem.

1) Create a function lc_lev which lower cases levels of a factor and passes the input through unchanged if not a factor. Then lapply it over the columns of the input -- here we use the built in CO2 -- and change it back to a data.frame of the same shape:

lc_lev <- function(x) {
  if (is.factor(x)) levels(x) <- tolower(levels(x))
  x
}

replace(CO2, TRUE, lapply(CO2, lc_lev))

1a) This would also work:

CO2[] <- lapply(CO2, lc_lev)

2) Another approach is to use S3. The generic (first line) dispatches to the factor method if the input is a factor and to the default method otherwise:

lc_lev2 <- function(x, ...) UseMethod("lc_lev2")
lc_lev2.factor <- function(x) { levels(x) <- tolower(levels(x)); x }
lc_lev2.default <- identity

CO2[] <- lapply(CO2, lc_lev2)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341