4

Below is an example data table, which I would like to remove the rows where value is NA and no earlier row has a value, i.e. also NA, and by group. As not all have the same number leading missing values I'm getting stuck and not having any luck searching.

Example data table

    group    date     value
      a 2015-01-01    NA
      a 2015-01-02     2
      a 2015-01-03     3
      a 2015-01-04    NA
      a 2015-01-05     2
      b 2015-01-01    NA
      b 2015-01-02    NA
      b 2015-01-03     2
      b 2015-01-04    NA
      b 2015-01-05     2

Finished data table

    group    date     value
      a 2015-01-02     2
      a 2015-01-03     3
      a 2015-01-04    NA
      a 2015-01-05     2
      b 2015-01-03     2
      b 2015-01-04    NA
      b 2015-01-05     2

Later I plan to impute the missing values by those that come before and after.

EDIT: Found previously asked question here, which is similar.

Community
  • 1
  • 1
Zachary
  • 319
  • 1
  • 7

1 Answers1

3

A basic approach would be to use which and .N, like this:

DT[, .SD[(which(!is.na(value))[1]):.N], by = group]
##    group       date value
## 1:     a 2015-01-02     2
## 2:     a 2015-01-03     3
## 3:     a 2015-01-04    NA
## 4:     a 2015-01-05     2
## 5:     b 2015-01-03     2
## 6:     b 2015-01-04    NA
## 7:     b 2015-01-05     2
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485