Replace blanks in an existing column conditional on two factors in the dataframe

Question

I currently have a 90,000 row dataframe of species taxonomical data that dates back to 2000. Many of these rows (in the order to 5,000) species name data was left blank and I want to use the descriptors of two other columns to designate morphospecies names. This means I need to add levels of a factor to an existing column conditionally on two other factor levels. the data looks like:

lepfam        lepnotes           lepsp
Aididae       green/spikes    
Aididae       greeen/nospikes 
Aididae       black/orangespots
Nymphalidae                       Amastus coccinator

The output should look like:

lepfam        lepnotes             lepsp
Aididae       green_spikes         Aididae morphosp1
Aididae       greeen_nospikes      Aididae morphosp2
Aididae       black_orangespots    Aididae morphosp3
Nymphalidae                        Amastus coccinator

I have tried the following code:

file$lepsp[file$lepfam =="Aididae" & file$lepnotes == "green_spikes"]
<- "Aididae morphosp1"

And I get the following error:

Warning message:
In `[<-.factor`(`*tmp*`, file$lepfam == "Aididae" & file$lepcn ==  :
invalid factor level, NA generated

Then I found the following stackoverflow response [Replace contents of factor column in R dataframe with the following solution:

 levels(iris$Species) <- c(levels(iris$Species), "new.species")
 iris$Species[iris$Species == 'virginica'] <- 'new.species'

But this is not helpful to me because I would have to list thousands of new factor levels in the code. Is there an efficient solution to filling new values that generate thousands of new factor levels to an existing column? Or, make a new column with this information and merge the existing lepsp factor levels with new lepsp factor levels?

Why not convert it to a character first with `as.character`? That way you can add any descriptions you want without having to re-factorize it — Mike H., May 29 '17 at 18:21
Perhaps convert everything to `character`, do the transformation, and then convert back to `factor` if you need to at the end — Andrew Gustar, May 29 '17 at 18:21
That worked. Thank you. I changed the lepsp factor to a character and it worked fine. Should I remove this question? — Danielle, May 29 '17 at 18:29

Replace blanks in an existing column conditional on two factors in the dataframe

0 Answers0