7

So I have a data frame like this one:

First Group  Bob
             Joe
             John
             Jesse
Second Group Jane
             Mary
             Emily
             Sarah
             Grace

I would like to fill in the empty cells in the first column in the data frame with the last string in that column i.e

First Group  Bob
First Group  Joe
First Group  John
First Group  Jesse
Second Group Jane
Second Group Mary
Second Group Emily
Second Group Sarah
Second Group Grace

With tidyr, there is fill() but it obviously doesn't work with strings. Is there an equivalent for strings? If not is there a way to accomplish this?

Frank
  • 66,179
  • 8
  • 96
  • 180
Sean
  • 145
  • 2
  • 12
  • 5
    Are you sure it doesn't work with character columns? – joran Oct 11 '18 at 19:13
  • 1
    Testing it gives me `Error in UseMethod("fill_") : no applicable method for 'fill_' applied to an object of class "character"`, even though `?fill` says it takes atomic vectors. You can do something like this. https://stackoverflow.com/questions/23340150/replace-missing-values-na-with-most-recent-non-na-by-group – Anonymous coward Oct 11 '18 at 19:36
  • 2
    `fill` fills missing values, i.e. `NA`. Are your "empty cells" `NA`, or `""` ('blank' `character`); note the difference between `c("a", "", "b")` and `c("a", NA, "b")`. `fill(data.frame(x = c("a", "", "b")), x)`; `fill(data.frame(x = c("a", NA, "b")), x)` – Henrik Oct 11 '18 at 19:38
  • Probably relevant here: check the `na.strings` argument in `read.table`. – Henrik Oct 11 '18 at 19:49
  • It would help us understand your data if you can please include a [MWE](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) The easiest and recommended way to that is to use `dput()` to give your example data with structure. That way it makes it easy for those who want to help and you'll greatly increase your chances of better/faster/more numerous responses. – krads Oct 12 '18 at 17:32
  • Searching for a way to fill `NA` in a **character *vector*** took me here; but the answer is [here](https://stackoverflow.com/questions/7735647/replacing-nas-with-latest-non-na-value). TL;DR check out the `zoo::na.locf` and similar `zoo::na.___` options – Fons MA Feb 02 '21 at 01:05

3 Answers3

18

Seems fill() is designed to be used in isolation. When using fill() inside a mutate() statement this error appears (regardless of the data type), but it works when using it as just a component of the pipe structure. Could that have been the problem?

Just for full clarity, a quick example. Assuming you have a data frame called 'people' with columns 'group' and 'name', the right structure would be:

people %>%
    fill(group)

and the following would give the error you described (and a similar error when using numbers):

people %>%
    mutate(
        group = fill(group)
    )
Thomas Bilach
  • 591
  • 2
  • 16
  • This can work if you introduce the data input to the function (here as "."): ```people %>% mutate(group = fill(., group))``` Typically this is extraneous, though people may want the original column for comparison: ```people %>% mutate(group_fill = fill(.,group))``` – glenn_in_boston Dec 08 '22 at 22:21
1

(I made the assumption that this was output from an R console session. If it's a raw text file the data input may need to be done with read.fwf.)

The display suggests those are empty character values in the "spaces">

First set them to NA and then use na.locf from zoo:

 dat[dat==""] <- NA
 dat[1:2] <- lapply(dat[1:2], zoo::na.locf)
 dat
#------------
      V1    V2    V3
1  First Group   Bob
2  First Group   Joe
3  First Group  John
4  First Group Jesse
5 Second Group  Jane
6 Second Group  Mary
7 Second Group Emily
8 Second Group  Sara
9 Second Group Grace

To start with what I was using:

dat <-
structure(list(V1 = structure(c(2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 
1L), .Label = c("", "First", "Second"), class = "factor"), V2 = structure(c(2L, 
1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("", "Group"), class = "factor"), 
    V3 = structure(c(1L, 6L, 7L, 5L, 4L, 8L, 2L, 9L, 3L), .Label = c("Bob", 
    "Emily", "Grace", "Jane", "Jesse", "Joe", "John", "Mary", 
    "Sara"), class = "factor")), class = "data.frame", row.names = c(NA, 
-9L))
IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

If I have to take a stab at what your data structure is, I might have something like this:

df <- data.frame(c1=c("First Group", "", "", "", "Second Group", "", "", "", ""),
                 c2=c("Bob","Joe","Jon","Jesse","Jane","Mary","Emily","Sara","Grace"),
                 stringsAsFactors = FALSE)

Then, a very basic way to do this would be by simply looping:

for(i in 2:nrow(df)) if(df$c1[i]=="") df$c1[i] <- df$c1[i-1]  

df

            c1    c2
1  First Group   Bob
2  First Group   Joe
3  First Group   Jon
4  First Group Jesse
5 Second Group  Jane
6 Second Group  Mary
7 Second Group Emily
8 Second Group  Sara
9 Second Group Grace

However, I would suggest you accept @42-'s solution if you have anything other than a small data set as zoo::na.locf is optimized to work with large numbers of records and is a very respected, widely used stable package.

krads
  • 1,350
  • 8
  • 14