58

I need to replace the levels of a factor column in a dataframe. Using the iris dataset as an example, how would I replace any cells which contain virginica with setosa in the Species column?

I expected the following to work, but it generates a warning message and simply inserts NAs:

iris$Species[iris$Species == 'virginica'] <- 'setosa'
luciano
  • 13,158
  • 36
  • 90
  • 130
  • Your example with `iris` just works. Can you replicate your problem in some other way? At the moment it's hard to understand what you want to do. – Andrie Aug 04 '12 at 17:34
  • Works for me. Which warning message you get? – sgibb Aug 04 '12 at 17:34
  • 1
    Its worked with iris when trying again. However applying the same to my dataset gives this: Warning message: In `[<-.factor`(`*tmp*`, x$Hweet == "hweet", value = c(NA_integer_, : invalid factor level, NAs generated – luciano Aug 04 '12 at 17:42
  • 4
    I strongly suspect that you want to operate on the *levels* of the factor rather than on the elements themselves ... based on your previous (very similar) question, I think you might get a bit farther by asking a *slightly* longer, **reproducible**, more complete question that explains what you're trying to do ... – Ben Bolker Aug 04 '12 at 17:44

9 Answers9

109

I bet the problem is when you are trying to replace values with a new one, one that is not currently part of the existing factor's levels:

levels(iris$Species)
# [1] "setosa"     "versicolor" "virginica" 

Your example was bad, this works:

iris$Species[iris$Species == 'virginica'] <- 'setosa'

This is what more likely creates the problem you were seeing with your own data:

iris$Species[iris$Species == 'virginica'] <- 'new.species'
# Warning message:
# In `[<-.factor`(`*tmp*`, iris$Species == "virginica", value = c(1L,  :
#   invalid factor level, NAs generated

It will work if you first increase your factor levels:

levels(iris$Species) <- c(levels(iris$Species), "new.species")
iris$Species[iris$Species == 'virginica'] <- 'new.species'

If you want to replace "species A" with "species B" you'd be better off with

levels(iris$Species)[match("oldspecies",levels(iris$Species))] <- "newspecies"
zx8754
  • 52,746
  • 12
  • 114
  • 209
flodel
  • 87,577
  • 21
  • 185
  • 223
  • 18
    but if you want to replace species A with species B you'd be better off with `levels(iris$Species)[match("oldspecies",levels(iris$Species))] <- "newspecies"` – Ben Bolker Aug 04 '12 at 17:55
22

For the things that you are suggesting you can just change the levels using the levels:

levels(iris$Species)[3] <- 'new'
Greg Snow
  • 48,497
  • 6
  • 83
  • 110
13

You can use the function revalue from the package plyr to replace values in a factor vector.

In your example to replace the factor virginica by setosa:

 data(iris)
 library(plyr)
 revalue(iris$Species, c("virginica" = "setosa")) -> iris$Species
emudrak
  • 789
  • 8
  • 25
nebi
  • 722
  • 1
  • 8
  • 17
6

I had the same problem. This worked better:

Identify which level you want to modify: levels(iris$Species)

    "setosa" "versicolor" "virginica" 

So, setosa is the first.

Then, write this:

     levels(iris$Species)[1] <-"new name"
Koby Douek
  • 16,156
  • 19
  • 74
  • 103
PriHoh
  • 61
  • 1
  • 2
6

Using dlpyr::mutate and forcats::fct_recode:

library(dplyr)
library(forcats)

iris <- iris %>%  
  mutate(Species = fct_recode(Species,
    "Virginica" = "virginica",
    "Versicolor" = "versicolor"
  )) 

iris %>% 
  count(Species)

# A tibble: 3 x 2
     Species     n
      <fctr> <int>
1     setosa    50
2 Versicolor    50
3  Virginica    50   
sbha
  • 9,802
  • 2
  • 74
  • 62
3

A more general solution that works with all the data frame at once and where you don't have to add new factors levels is:

data.mtx <- as.matrix(data.df)
data.mtx[which(data.mtx == "old.value.to.replace")] <- "new.value"
data.df <- as.data.frame(data.mtx)

A nice feature of this code is that you can assign as many values as you have in your original data frame at once, not only one "new.value", and the new values can be random values. Thus you can create a complete new random data frame with the same size as the original.

alejandro
  • 521
  • 8
  • 18
2

You want to replace the values in a dataset column, but you're getting an error like this:

invalid factor level, NA generated

Try this instead:

levels(dataframe$column)[levels(dataframe$column)=='old_value'] <- 'new_value'

JColares
  • 449
  • 3
  • 8
0

In case you have to replace multiple values and if you don't mind "refactoring" your variable with as.factor(as.character(...)) you could try the following:

replace.values <- function(search, replace, x){
  stopifnot(length(search) == length(replace))
  xnew <- replace[ match(x, search) ]
  takeOld <- is.na(xnew) & !is.na(x)
  xnew[takeOld] <- x[takeOld]
  return(xnew)
}

iris$Species <- as.factor(search=c("oldValue1","oldValue2"),
                          replace=c("newValue1","newValue2"),
                          x=as.character(iris$Species))
Daniel Hoop
  • 652
  • 1
  • 5
  • 16
0

levels(iris$Species)

levels(iris$Species)[3] <- 'setosa'

Kon Li
  • 9
  • 2