1

I want to replace row 22907, column 1 with a string. The column is made up of factors. Column 1 in the dataframe is called geneID. I have tried the following:

df[22907,1] == 'CDEF'

But that gave the following error:

Warning message: In [<-.factor(*tmp*, iseq, value = "OKSM") : invalid factor level, NA generated

I understand how to replace all NA's with a value, however I am only looking to replace this specific one.

Edit: Pretty sure this question is not a duplicate of the one linked - we got similar errors however the base question was different. This explains how to replace a single value in a dataframe.

2 Answers2

2

We convert the column to character and then do the assignment

df[[1]] <- as.character(df[[1]])
df[22907,1] <- 'CDEF'

Or if we need to keep it as factor, create 'CDEF' as one of the levels of the column before the assignment

levels(df[[1]]) <- c(levels(df[[1]]), 'CDEF')
df[22907,1] <- 'CDEF'
df[22907, 1]
#[1] CDEF
#Levels: A B C D E CDEF

data

set.seed(24)
df <- data.frame(geneID = sample(LETTERS[1:5], 30000, replace = TRUE), 
            col2 = rnorm(30000))
df[22907, 1] <- NA
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You will need to add the level 'CDEF' first, then you can add it - factors require that the level exists before it can be assigned. Also == doesn't assign, use = or better yet <-. It's also good practice to call columns by their name rather than position (df[22907, 'geneID']).

levels(df[,1]) <- c(levels(df[,1]), 'CDEF')
df[22907,1] <- 'CDEF'

You can read more here about how to handle and think about factors: https://swcarpentry.github.io/r-novice-inflammation/12-supp-factors/

rg255
  • 4,119
  • 3
  • 22
  • 40
  • Wasn't there when I wrote it – rg255 May 28 '18 at 06:26
  • 1
    Yes, and in that time I was writing my answer on a phone - there was no way to see what had happened on your answer. I haven't stolen your answer. – rg255 May 28 '18 at 06:29