1

I am trying to delete rows from a data.table file if they don't meet a criteria. Essentially, I want to delete all lines that don't have a grp label that repeats 18 times (label 32 repeats 18 times, it's just not visible in the example). In the example below, the the grp label "33" only repeats 4 times. I therefore would like to remove those 4 lines automatically.

Input:

library(data.table)
x <- fread(x)
tail(x)
           V1  V2  V3 grp
1: uc007cih.1 575 175  32
2: uc007cih.1 576 142  32
3: uc007cih.1 577 104  33
4: uc007cih.1 578  99  33
5: uc007cih.1 579  95  33
6: uc007cih.1 580  94  33

The grp label can change and there could be several repeats but if they don't exist 18 times they should just get deleted essentially. How can I do this?

thelatemail
  • 91,185
  • 12
  • 128
  • 188
user3141121
  • 480
  • 3
  • 8
  • 17
  • 2
    Unless something has changed since [this was posted](http://stackoverflow.com/questions/10790204/how-to-delete-a-row-by-reference-in-r-data-table), there's no way to delete data.table rows by reference. (You can, of course, create a **new** object that contains just the rows you're after). – Josh O'Brien Mar 25 '14 at 22:22

1 Answers1

3

Here you go:

x.filtered = x[, if(.N == 18) .SD, by = grp]
eddi
  • 49,088
  • 6
  • 104
  • 155