2

I would like to make use of the na.locf to carry forward non-missing values for data frames where first observation may be zero.

Problem

dta <- data.frame(A = c(NA, NA, 1, 2, 4, 5, NA, NA, NA),
                  B = c(NA, 5, 4, 5, 8, 9, NA, NA, 100))
dta %>% mutate_all(.funs = funs(na.locf(.)))

Error in mutate_impl(.data, dots) : Column A must be length 9 (the number of rows) or one, not 7

Desired results

Vectorize(require)(package = c("dplyr", "zoo"),
                   character.only = TRUE)

dta <- data.frame(A = c(0, NA, 1, 2, 4, 5, NA, NA, NA),
                  B = c(0, 5, 4, 5, 8, 9, NA, NA, 100))
dta %>% mutate_all(.funs = funs(na.locf(.)))

Workaround

The potential workaround would could involve replacing first set of NAs with zeros and carrying zero forward that could be later replaced but I'm interested in leaving NAs where they are and exploring if there is a convenient way to make na.locf ignore situations where the function did not receive non-NA value to start replacing.

Konrad
  • 17,740
  • 16
  • 106
  • 167
  • @docendodiscimus You are right, this will work. I was doing `na.rm = TRUE`. My bad, too many hours spent doing one thing. Thanks for your input. – Konrad Nov 09 '17 at 16:10

3 Answers3

7

Use the na.rm = FALSE argument noting that it can take an entire data frame -- you don't have to separately apply it to each column.

na.locf(dta, na.rm = FALSE)

This gives:

   A   B
1 NA  NA
2 NA   5
3  1   4
4  2   5
5  4   8
6  5   9
7  5   9
8  5   9
9  5 100

Also there is na.locf0:

dta %>% mutate_all(.funs = funs(na.locf0(.)))

See the help page ?na.locf which documents the na.rm argument and also documents na.locf0 . Note that na.locf0 currently does have to be applied individually by column but always produces output of the same length.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
3

(Was in the process of writing this answer when @docendodiscimus's comment appeared)

From ?na.locf:

na.rm logical. Should leading NAs be removed?

So use na.rm=FALSE, optionally replacing the remaining NA values (i.e. those that were leading) with zeros thereafter:

dta <- data.frame(A = c(NA, NA, 1, 2, 4, 5, NA, NA, NA),
                  B = c(NA, 5, 4, 5, 8, 9, NA, NA, 100))
na_zero <- function(x) replace(x,is.na(x),0)
dta %>% mutate_all(.funs = funs(na.locf(.,na.rm=FALSE))) %>%
   mutate_all(.funs=funs(na_zero(.)))
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • except for the zero-replacement bit, which might be useful to others but which I think the OP doesn't actually need, @G.Grothendieck's answer is better than mine. Please go vote for his. – Ben Bolker Nov 09 '17 at 16:23
0

Maybe as an additional hint, if you are using the locf function of the package imputeTS you can choose between several options on what to do with the trailing NAs via the parameter na.remaining :

Selections for na.remaining:

  • keep" - return the series with NAs
  • "rm" - remove remaining NAs
  • "mean" - replace remaining NAs by overall mean
  • "rev" - perform nocb / locf from the reverse direction

The desired output could thus be reached the following way:

dta <- data.frame(A = c(NA, NA, 1, 2, 4, 5, NA, NA, NA),
              B = c(NA, 5, 4, 5, 8, 9, NA, NA, 100))

library(imputeTS)
na.locf(dta, na.remaining = "keep")

The mutate_all is not necessary here, since na.locf is automatically applied to all columns (this is also the case when using na.locf of zoo)

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55