I am struggling with a particular data manipulation issue. I posted recently on this, and was suggested the na.locf function in the zoo library, but this is a homework question and the TA specified not to load other libraries to solve this, so I need to use a more traditional data manipulation approach. We are given the following dataframe, which I show the first 10 rows of:
my_df[1:10,1:2]
DATE FLIGHT
1 May 26 NA
2 Jun 10 NA
3 Jun 21 NA
4 1
5 2
6 3
7 Jun 23 NA
8 1
9 2
10 Jun 25 NA
I need to drag values down in the DATE column where there are blanks (rows 4-6 all Jun 21, rows 8-9 both Jun 23, etc.). I understand how na.locf makes this a one liner, a shame I cannot use it.
My idea is as follows:
grab the dates that need to be dragged down using:
repeat_dates = my_df$DATE[which(my_df$FLIGHT == 1)-1]
create a vector with the number of times to repeat each date. here 3 corresponds to having to drag Jun 21 down 3 rows, 2 for dragging Jun 23 down 2 rows, etc.
repeat_count = [3, 2, ...]
use rep.int(repeat_dates, repeat_count) to get the vector:
["Jun 21", "Jun 21", "Jun 21", "Jun 23", "Jun 23", ...]
and then update the values
I am not sure how to do part 2, if I can get this then I can get the problem. My TA suggested using the cumsum() function, possibly on a vector of booleans (1s / 0s where the DATE is blank?). I'm struggling with this, any help is appreciated!
EDIT - reproducable to help
dput(my_df[1:20,1:2])
structure(list(DATE = structure(c(-23961, -23946, -23935, NA,
NA, NA, -23933, NA, NA, -23931, -23911, -23893, NA, NA, -23891,
NA, NA, -23890, NA, NA), class = "Date"), FLIGHT = c(NA, NA,
NA, 1L, 2L, 3L, NA, 1L, 2L, NA, NA, NA, 1L, 2L, NA, 1L, 2L, NA,
1L, 2L)), .Names = c("DATE", "FLIGHT"), row.names = c(NA, 20L
), class = "data.frame")