3

I have a dataset with observations of multiple patients and their diagnoses over time. There are 9 different dummy variables, each representing a specific diagnosis, named e.g. L40, L41, K50, M05 and so on.

Where there are missing values in the dummy variables, I want to carry forward the last non-missing value by patient, so that once a patient receives a diagnosis, it will follow through to subsequent observations.

I started with this, using the na.locf function from the zoo package.

diagdata <- originaldata[,grep("^patient|^ar|^edatum|^K|^L|^M",colnames(originaldata))]

require(zoo)
require(data.table)

diagnosis <- data.table(diagdata)

diagnosis[,L40:=na.locf(L40),by=patient]

This achieves what I am looking for, but only on the column in question (L40). Is there any way of applying the above to all the relevant diagnosis columns, i.e. columns starting with K, L and M?

eddi
  • 49,088
  • 6
  • 104
  • 155
udden2903
  • 783
  • 6
  • 15
  • use `setDT` or `as.data.table` instead of `data.table()` to convert from `data.frame` in place or as a copy respectively – eddi May 06 '16 at 20:21

1 Answers1

8
cols = grep("^K|^L|^M", names(diagnosis), value = T)

diagnosis[, (cols) := na.locf(.SD, na.rm = F), by = patient, .SDcols = cols]

Also take a look at efficiently locf by groups in a single R data.table.

Community
  • 1
  • 1
eddi
  • 49,088
  • 6
  • 104
  • 155
  • Isn't there any na.locf alternative implemented only with data.table functions? – skan May 24 '17 at 22:35
  • the roll = T, Inf, -Inf supposedly does the same – M M Jul 17 '17 at 07:43
  • 1
    @MM the locf in rolling joins, happens **during joins**. It's possible to convert this to a join problem, but that's not an effective use of R. – eddi Jul 17 '17 at 18:37