2

Often times I get data in the format of an excel spreadsheet that looks like:

Name1 Data
      Data
      Data  
Name2 Data
      Data
      Data
....

When I read this into R, the blank rows come us as NA and then I always write some ad hoc code to fill it in so that it looks like:

Name1 Data
Name1 Data
Name1 Data
Name2 Data
Name2 Data
Name2 Data
....

Typically, I just do this with a for loop that keeps track of the last name and every time I see an NA I fill it in. The next time I see something in the name column, I then save the new name and start writing that.

I was wondering if there was a prettier, R vectorized version of that?

Thanks!

3442
  • 8,248
  • 2
  • 19
  • 41
user1357015
  • 11,168
  • 22
  • 66
  • 111

3 Answers3

4

You can try

  indx <- !is.na(df$Col1)
  df$Col1 <- df$Col1[indx][cumsum(indx)] 
  df
  #   Col1 Col2
  #1 Name1 Data
  #2 Name1 Data
  #3 Name1 Data
  #4 Name2 Data
  #5 Name2 Data
  #6 Name2 Data

data

df <- structure(list(Col1 = c("Name1", NA, NA, "Name2", NA, NA),
 Col2 = c("Data", "Data", "Data", "Data", "Data", "Data")), .Names = c("Col1", 
"Col2"), class = "data.frame", row.names = c(NA, -6L))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

I really like @akrun's answer, very clever! I may try to switch to that method in the future. Another option is zoo::na.locf, (locf stands for last observation carried forward). Using akrun's data

> df$Col1 <- zoo::na.locf(df$Col1)
> df
   Col1 Col2
1 Name1 Data
2 Name1 Data
3 Name1 Data
4 Name2 Data
5 Name2 Data
6 Name2 Data

na.locf has a features that help it work in varied cases... options for what to do with leading NAs, you can carry backward rather than forward, and a few other special handlers (see ?na.locf).

(data)

df <- structure(list(Col1 = c("Name1", NA, NA, "Name2", NA, NA),
 Col2 = c("Data", "Data", "Data", "Data", "Data", "Data")), .Names = c("Col1", 
"Col2"), class = "data.frame", row.names = c(NA, -6L))
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
1

Here is another approach using the Reduce function:

> df <- structure(list(Col1 = c("Name1", NA, NA, "Name2", NA, NA),
+        Col2 = c("Data", "Data", "Data", "Data", "Data", "Data")), .Names = c("Col1", 
+        "Col2"), class = "data.frame", row.names = c(NA, -6L))
> 
> df
   Col1 Col2
1 Name1 Data
2  <NA> Data
3  <NA> Data
4 Name2 Data
5  <NA> Data
6  <NA> Data
> df$Col1 <- Reduce(function(a,b) if(is.na(b)) a else b, df$Col1,
+ accumulate=TRUE)
> df
   Col1 Col2
1 Name1 Data
2 Name1 Data
3 Name1 Data
4 Name2 Data
5 Name2 Data
6 Name2 Data
> 
Greg Snow
  • 48,497
  • 6
  • 83
  • 110