0

I am attempting to subset a data frame by removing rows containing certain charater patterns, which are stored in a vector. My issue is that only the last pattern of the vector is removed from my data frame. How can I make my loop work iteratively, so that all patterns stored in the vector are removed from my data frame?

Mock input:

df<-data.frame(organism=c("human_longname","cat_longname","bird_longname","virus_longname","bat_longname","pangolian_longname"),size=c(6,4,2,1,3,5))
df
   organism            size
1     human_longname     6
2       cat_longname     4
3      bird_longname     2
4     virus_longname     1
5       bat_longname     3
6 pangolian_longname     5

used code and output:

vectors<-c("bat","virus","pangolian")
for(i in vectors){df_1<-df[!grepl(i,df$organism),]}
df_1
  organism             size
1    human_longname      6
2      cat_longname      4
3     bird_longname      2
4    virus_longname      1
5      bat_longname      3

Expected output

df_1
  organism             size
1    human_longname      6
2      cat_longname      4
3     bird_longname      2
oguz ismail
  • 1
  • 16
  • 47
  • 69

1 Answers1

1

You can try this:

df[!df$organism %in% c("bat","virus","pangolian"),]

  organism size
1    human    6
2      cat    4
3     bird    2

Update: Based on new data, here an approach using grepl(). These functions can be used to avoid loops:

#Vectors
vectors<-c("bat","virus","pangolian")
#Format
vectors2 <- paste0(vectors,collapse = '|')
#Avoid loop
df[!grepl(pattern = vectors2,df$organism),]

        organism size
1 human_longname    6
2   cat_longname    4
3  bird_longname    2

Also just for curious, here maybe a not optimal loop to do the same task creating a new dataframe and an index:

#Create index
index <- c()
#Loop
for(i in 1:dim(df)[1])
{
  if(grepl(vectors2,df$organism[i])==F) 
  {
    index <- c(index,i)
  }
  ndf <- df[index,]
}

ndf

        organism size
1 human_longname    6
2   cat_longname    4
3  bird_longname    2
Duck
  • 39,058
  • 13
  • 42
  • 84