deleting lines in data table if criteria isn't met

Question

I am trying to delete rows from a data.table file if they don't meet a criteria. Essentially, I want to delete all lines that don't have a grp label that repeats 18 times (label 32 repeats 18 times, it's just not visible in the example). In the example below, the the grp label "33" only repeats 4 times. I therefore would like to remove those 4 lines automatically.

Input:

library(data.table)
x <- fread(x)
tail(x)
           V1  V2  V3 grp
1: uc007cih.1 575 175  32
2: uc007cih.1 576 142  32
3: uc007cih.1 577 104  33
4: uc007cih.1 578  99  33
5: uc007cih.1 579  95  33
6: uc007cih.1 580  94  33

The grp label can change and there could be several repeats but if they don't exist 18 times they should just get deleted essentially. How can I do this?

Unless something has changed since [this was posted](http://stackoverflow.com/questions/10790204/how-to-delete-a-row-by-reference-in-r-data-table), there's no way to delete data.table rows by reference. (You can, of course, create a **new** object that contains just the rows you're after). — Josh O'Brien, Mar 25 '14 at 22:22

score 3 · Accepted Answer · answered Mar 25 '14 at 22:26

3

Here you go:

x.filtered = x[, if(.N == 18) .SD, by = grp]

answered Mar 25 '14 at 22:26

eddi

49,088
6
104
155

deleting lines in data table if criteria isn't met

1 Answers1