I currently have a 90,000 row dataframe of species taxonomical data that dates back to 2000. Many of these rows (in the order to 5,000) species name data was left blank and I want to use the descriptors of two other columns to designate morphospecies names. This means I need to add levels of a factor to an existing column conditionally on two other factor levels. the data looks like:
lepfam lepnotes lepsp
Aididae green/spikes
Aididae greeen/nospikes
Aididae black/orangespots
Nymphalidae Amastus coccinator
The output should look like:
lepfam lepnotes lepsp
Aididae green_spikes Aididae morphosp1
Aididae greeen_nospikes Aididae morphosp2
Aididae black_orangespots Aididae morphosp3
Nymphalidae Amastus coccinator
I have tried the following code:
file$lepsp[file$lepfam =="Aididae" & file$lepnotes == "green_spikes"]
<- "Aididae morphosp1"
And I get the following error:
Warning message:
In `[<-.factor`(`*tmp*`, file$lepfam == "Aididae" & file$lepcn == :
invalid factor level, NA generated
Then I found the following stackoverflow response [Replace contents of factor column in R dataframe with the following solution:
levels(iris$Species) <- c(levels(iris$Species), "new.species")
iris$Species[iris$Species == 'virginica'] <- 'new.species'
But this is not helpful to me because I would have to list thousands of new factor levels in the code. Is there an efficient solution to filling new values that generate thousands of new factor levels to an existing column? Or, make a new column with this information and merge the existing lepsp factor levels with new lepsp factor levels?