0

I have a data frame like:

       Domain         Phylum          Class          Order
ID_1 Bacteria  Cyanobacteria Unclassified_c Unclassified_o
ID_2 Bacteria  Cyanobacteria Unclassified_c Unclassified_o
ID_3 Bacteria  Bacteroidetes Unclassified_c Unclassified_o
ID_4 Bacteria Proteobacteria Unclassified_c Unclassified_o
ID_5 Bacteria  Bacteroidetes Unclassified_c Unclassified_o

and I want to replace all the character Unclassified_c, Unclassified_o, elment_3, etc, for NA, so I had tried:

df[df == "Unclassified_c" ] <- NA

this work well if I use one by one value, but sometimes could be to many; So I will like to try something like a list of patterns and then use it, something like:

Remove_list <- ("Unclassified_c", "Unclassified_o", "element_3", "element_4", "element_x") 

and then use the list to replace for NA:

df[ df == Remove_list ] <- NA 

It change to NA some of the values but not all. I don't want to use stringr library, because it eliminate the rownames (ID_1 .. ID_x) and I need it, so I will like to try Rbase, any suggestion

Thanks so much !!!!

Jaap
  • 81,064
  • 34
  • 182
  • 193
abraham
  • 661
  • 8
  • 14

1 Answers1

3

We can use sapply with %in% which returns logical matrix of whether a value is present in Remove_list or not. We can assign NA for TRUE values.

df[sapply(df, `%in%`, Remove_list)] <- NA

df
#       Domain         Phylum Class Order
#ID_1 Bacteria  Cyanobacteria  <NA>  <NA>
#ID_2 Bacteria  Cyanobacteria  <NA>  <NA>
#ID_3 Bacteria  Bacteroidetes  <NA>  <NA>
#ID_4 Bacteria Proteobacteria  <NA>  <NA>
#ID_5 Bacteria  Bacteroidetes  <NA>  <NA>
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213