2

With the help of sebastian-c, I figured out my problem with daily data. Please see: R ifelse condition: frequency of continuously NA

And now I have a data set with hourly data:

set.seed(1234)  
day <- c(rep(1:2, each=24))  
hr <- c(rep(0:23, 2))  
v <- c(rep(NA, 48))   
A <- data.frame(cbind(day, hr, v))  
A$v <- sample(c(NA, rnorm(100)), nrow(A), prob=c(0.5, rep(0.5/100, 100)), replace=TRUE)  

What I need to do is: If there are more(>=) 4 continuously missing day-hours(7AM-7PM) or >= 3 continuously missing night-hours(7PM-7AM), I will delete the entire day from the data frame, otherwise just run linear interpolation. Thus, the second day should be entirely deleted from the data frame since there are 4 continuously NA during day-time (7AM-10AM). The result is preferably remain data frame. Please help, thank you!

Community
  • 1
  • 1
Rosa
  • 1,793
  • 5
  • 18
  • 23

1 Answers1

1

If I modify the NA_run function from the question you linked to take a variable named v instead of value and return the boolean rather than the data.frame:

NA_run <- function(x, maxlen){
  runs <- rle(is.na(x$v))
  any(runs$lengths[runs$values] >= maxlen)
}

I can then write a wrapper function to call it twice for daytime and nighttime:

dropfun <- function(x) {
  dt <- x$hr > 7 & x$hr < 19
  daytime <- NA_run(x[dt,], 4)
  nighttime <- NA_run(x[!dt,], 3)

  any(daytime, nighttime)
}

Which gives me a data.frame of days to drop.

> ddply(A, .(day), dropfun)
  day    V1
1   1  TRUE
2   2 FALSE
> 

We can alter the dropfun to return the dataframe instead though:

dropfun <- function(x) {
  dt <- x$hr > 7 & x$hr < 19
  daytime <- NA_run(x[dt,], 4)
  nighttime <- NA_run(x[!dt,], 3)

  if(any(daytime, nighttime)) NULL else x
}

> ddply(A, .(day), dropfun)
   day hr           v
1    2  0          NA
2    2  1          NA
3    2  2  2.54899107
4    2  3          NA
5    2  4 -0.03476039
6    2  5          NA
7    2  6  0.65658846
8    2  7  0.95949406
9    2  8          NA
10   2  9  1.08444118
11   2 10  0.95949406
12   2 11          NA
13   2 12 -1.80603126
14   2 13          NA
15   2 14          NA
16   2 15  0.97291675
17   2 16          NA
18   2 17          NA
19   2 18          NA
20   2 19 -0.29429386
21   2 20  0.87820363
22   2 21          NA
23   2 22  0.56305582
24   2 23 -0.11028549
> 
Justin
  • 42,475
  • 9
  • 93
  • 111