R - cumulative sum-"like" operation on a boolean vector

Question

I am struggling with a particular data manipulation issue. I posted recently on this, and was suggested the na.locf function in the zoo library, but this is a homework question and the TA specified not to load other libraries to solve this, so I need to use a more traditional data manipulation approach. We are given the following dataframe, which I show the first 10 rows of:

my_df[1:10,1:2]

     DATE FLIGHT
1  May 26     NA
2  Jun 10     NA
3  Jun 21     NA
4              1
5              2
6              3
7  Jun 23     NA
8              1
9              2
10 Jun 25     NA

I need to drag values down in the DATE column where there are blanks (rows 4-6 all Jun 21, rows 8-9 both Jun 23, etc.). I understand how na.locf makes this a one liner, a shame I cannot use it.

My idea is as follows:

grab the dates that need to be dragged down using:

repeat_dates = my_df$DATE[which(my_df$FLIGHT == 1)-1]

create a vector with the number of times to repeat each date. here 3 corresponds to having to drag Jun 21 down 3 rows, 2 for dragging Jun 23 down 2 rows, etc.
```
repeat_count = [3, 2, ...]
```
use rep.int(repeat_dates, repeat_count) to get the vector:
```
["Jun 21", "Jun 21", "Jun 21", "Jun 23", "Jun 23", ...]
```
and then update the values

I am not sure how to do part 2, if I can get this then I can get the problem. My TA suggested using the cumsum() function, possibly on a vector of booleans (1s / 0s where the DATE is blank?). I'm struggling with this, any help is appreciated!

EDIT - reproducable to help

dput(my_df[1:20,1:2])
structure(list(DATE = structure(c(-23961, -23946, -23935, NA, 
NA, NA, -23933, NA, NA, -23931, -23911, -23893, NA, NA, -23891, 
NA, NA, -23890, NA, NA), class = "Date"), FLIGHT = c(NA, NA, 
NA, 1L, 2L, 3L, NA, 1L, 2L, NA, NA, NA, 1L, 2L, NA, 1L, 2L, NA, 
1L, 2L)), .Names = c("DATE", "FLIGHT"), row.names = c(NA, 20L
), class = "data.frame")

You can just you use a for loop and an if statement when the date is blank to replace it with the previous value. — tatxif, Feb 06 '17 at 20:52
Have a look at http://stackoverflow.com/questions/7735647/replacing-nas-with-latest-non-na-value — HubertL, Feb 06 '17 at 20:53
Possible duplicate of [Replacing NAs with latest non-NA value](http://stackoverflow.com/questions/7735647/replacing-nas-with-latest-non-na-value) — HubertL, Feb 06 '17 at 20:54
thanks i went with one of the solutions in here, appreciate it — Canovice, Feb 06 '17 at 22:02
Even if you can't use `na.locf` itself, presumably it's ok to look at the source code for it. — G. Grothendieck, Feb 07 '17 at 00:05

score 1 · Answer 1 · answered Feb 06 '17 at 20:55

1

for(i in 1:length(my_df$DATE)){
    if(my_df$DATE[i]==""){
    my_df$DATE[i] <- my_df$DATE[i-1]
    }
}

answered Feb 06 '17 at 20:55

tatxif

438
6
10

yeah this gets the job done a lot simpler than my approach – Canovice Feb 06 '17 at 21:00

score 0 · Accepted Answer · answered Feb 06 '17 at 22:04

I went with a solution in the link shared:

replace_na_with_last<-function(x,a=!is.na(x)){
    x[which(a)[c(1,1:sum(a))][cumsum(a)+1]]
}

this i believe is the approach alluded to by the TA in my class, and works sufficiently, despite an initial confusion to understand the indexing going on.

R - cumulative sum-"like" operation on a boolean vector

2 Answers2