1

On a big dataset, I would like to delete the rows which present an id in common to the sample A. Then I would like to remove all the row from sample A.

 feature id sample
      a  1      A
      b  1      B
      c  2      A
      d  2      C
      e  3      A
      f  4      B
      g  4      C
      h  5      C
      i  5      C

The output should be:

 feature id sample
      f  4      B
      g  4      C
      h  5      C
      i  5      C

As my dataset as more than 8000 rows, I need another way to do it than saying row by row which one I want to delete. I am not sure how to do that, any advice welcome.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Marine
  • 13
  • 3
  • Possible duplicate of [How do I delete rows in a data frame?](https://stackoverflow.com/questions/12328056/how-do-i-delete-rows-in-a-data-frame) – Alex_P Dec 28 '18 at 14:00

3 Answers3

3

Assuming you want to delete all the rows which has id same as the ids in "A", you could do

df[!df$id %in% df$id[df$sample == "A"], ]

#  feature id sample
#6       f  4      B
#7       g  4      C
#8       h  5      C
#9       i  5      C

Same with dplyr

library(dplyr)
df %>%  filter(!id %in% id[sample == "A"])
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

Here is a base R idea using ave,

with(dd, ave(sample, id, FUN = function(i)!'A' %in% i))

So to use it for indexing,

dd[as.logical(with(dd, ave(sample, id, FUN = function(i)!('A' %in% i)))),]
#  feature id sample
#6       f  4      B
#7       g  4      C
#8       h  5      C
#9       i  5      C
Sotos
  • 51,121
  • 6
  • 32
  • 66
0

We can use subset

subset(df, !id %in% id[sample == "A"])
#    feature id sample
#6       f  4      B
#7       g  4      C
#8       h  5      C
#9       i  5      C
akrun
  • 874,273
  • 37
  • 540
  • 662