1

I have a dataframe that looks like this:

d <- data.frame(county = c("Abilene", rep(NA, 5), "Cook", rep(NA, 4), "Blah", NA, "Allegheny", rep(NA, 3)))

      county
1    Abilene
2       <NA>
3       <NA>
4       <NA>
5       <NA>
6       <NA>
7       Cook
8       <NA>
9       <NA>
10      <NA>
11      <NA>
12      Blah
13      <NA>
14 Allegheny
15      <NA>
16      <NA>
17      <NA>

I want to fill in the <NA> with the value of the previous non-missing county name. In other words, I want to end up with this:

  county
1       Abilene
2       Abilene
3       Abilene
4       Abilene
5       Abilene
6       Abilene
7       Cook
8       Cook
9       Cook
10      Cook
11      Cook
12      Blah
13      Blah
14      Allegheny
15      Allegheny
16      Allegheny
17      Allegheny

So far, I have been looping over every value in d$county, updating a temporary variable with the name of every non-empty county value, and refilling each cell. This is very slow with a large dataframe. I would prefer to do this in dplyr, though am open to any other solution as well.

svenkatesh
  • 1,152
  • 2
  • 10
  • 25

2 Answers2

3

Using tidyr we can use fill(data, vars):

library(tidyr)
fill(d, county)
GGamba
  • 13,140
  • 3
  • 38
  • 47
  • 1
    FYI, `tidyr::fill()` is written in C++ and is, in my experience, orders of magnitude faster than doing the equivalent operation via an R loop. – jdobres Mar 03 '17 at 03:25
  • @jdobres - in fairness nobody would ever do this in a standard R loop on large data, unless they were torturing themselves. – thelatemail Mar 03 '17 at 04:09
  • Depends on what one means by "large". I was doing a rolling fill operation on a small data set with about a dozen columns and around 100k rows, which I didn't think would take all that long. It took hours. `dplyr::fill()` did the same in seconds. – jdobres Mar 03 '17 at 20:24
  • I mean, you wouldn't need to use a loop in base R to do this - something like http://stackoverflow.com/a/41752185/496803 would be much much more efficient (<1 sec for many million cases). – thelatemail Mar 05 '17 at 22:27
2

We can use na.locf

library(zoo)
na.locf(d)
akrun
  • 874,273
  • 37
  • 540
  • 662