1

I would like to exclude all rows in a df when it have JUST specific patterns (AA, AB, BB). My real data is have more than 20k lines and more than 2k columns! Follow a representative input example:

df <- "chr   position sample21s  sample23s sample22s
    chr2    150      AB           BB       AA       
    chr4    250      A            AA       BB
    chr5    350      AB           B        BB   
    chr7    550      AA           AA       AA
    chr8    650      BB           BB       AB"
df <- read.table(text=df, header=T)

Expected output:

chr   position sample21s  sample23s sample22s
chr4    250      A            AA       BB
chr5    350      AB           B        BB   

Any ideas?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
user2120870
  • 869
  • 4
  • 16
  • Could you provide a reproducible example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – r.bot Apr 21 '15 at 19:59
  • Sorry -- you want to *exclude* AA, AB, and BB? In the expected output shown those rows are still present. – verybadatthis Apr 21 '15 at 20:08
  • I want to exclude rows which have JUST AA, AB or BB, in any combination or proportion. If the row have at least one another kind of string I need to keep the row in output `df`. – user2120870 Apr 21 '15 at 20:10

1 Answers1

1

Here's one alternative...

> ind <- apply(df[, grepl("^sample", names(df))], 1,
              function(x) sum(x %in% c("AA", "AB", "BB"))!=3)

> df[ind, ]
   chr position sample21s sample23s sample22s
2 chr4      250         A        AA        BB
3 chr5      350        AB         B        BB
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138