Data table, removing leading missing values by group

Question

Below is an example data table, which I would like to remove the rows where value is NA and no earlier row has a value, i.e. also NA, and by group. As not all have the same number leading missing values I'm getting stuck and not having any luck searching.

Example data table

    group    date     value
      a 2015-01-01    NA
      a 2015-01-02     2
      a 2015-01-03     3
      a 2015-01-04    NA
      a 2015-01-05     2
      b 2015-01-01    NA
      b 2015-01-02    NA
      b 2015-01-03     2
      b 2015-01-04    NA
      b 2015-01-05     2

Finished data table

    group    date     value
      a 2015-01-02     2
      a 2015-01-03     3
      a 2015-01-04    NA
      a 2015-01-05     2
      b 2015-01-03     2
      b 2015-01-04    NA
      b 2015-01-05     2

Later I plan to impute the missing values by those that come before and after.

EDIT: Found previously asked question here, which is similar.

score 3 · Accepted Answer · answered Jun 11 '15 at 03:48

3

A basic approach would be to use which and .N, like this:

DT[, .SD[(which(!is.na(value))[1]):.N], by = group]
##    group       date value
## 1:     a 2015-01-02     2
## 2:     a 2015-01-03     3
## 3:     a 2015-01-04    NA
## 4:     a 2015-01-05     2
## 5:     b 2015-01-03     2
## 6:     b 2015-01-04    NA
## 7:     b 2015-01-05     2

answered Jun 11 '15 at 03:48

A5C1D2H2I1M1N2O1R2T1

190,393
28
405
485

1

You could use `which.max` too, as it will take the first `TRUE` - save you needing to subset to `[1]` – thelatemail Jun 11 '15 at 04:50

Data table, removing leading missing values by group

1 Answers1