2

I looking for a python function like fillna(method='bfill', limit=30) but inside R.

I have this data frame.

DATE                      ELE.CN
   <dttm>                     <dbl>
 1 2009-06-30 00:00:00 115942928608
 2 2009-06-28 00:00:00 115942928608
 3 2009-06-27 00:00:00 115942928608
 4 2009-06-26 00:00:00 115942928608
 5 2009-06-24 00:00:00           NA
 6 2009-06-23 00:00:00           NA
 7 2009-06-21 00:00:00           NA
 8 2009-06-20 00:00:00           NA
 9 2009-06-19 00:00:00           NA
10 2009-06-17 00:00:00           NA
...

The idea I have is to fill a few NAS with a limit of 30. I have searched, but have not found anything similar.

Thank you.

Mick
  • 67
  • 7

2 Answers2

3

One potential solution is to use vec_fill_missing() from the vctrs package which has a "max_fill" option:

library(tidyverse)
library(vctrs)

df <- read.table(text = "DATE                      ELE.CN
 2009-06-30 00:00:00 115942928608
 2009-06-28 00:00:00 115942928608
 2009-06-27 00:00:00 115942928608
 2009-06-26 00:00:00 115942928608
 2009-06-24 00:00:00           NA
 2009-06-23 00:00:00           NA
 2009-06-21 00:00:00           NA
 2009-06-20 00:00:00           NA
 2009-06-19 00:00:00           NA
 2009-06-17 00:00:00           NA", header = TRUE)
df
#>                DATE       ELE.CN
#> 2009-06-30 00:00:00 115942928608
#> 2009-06-28 00:00:00 115942928608
#> 2009-06-27 00:00:00 115942928608
#> 2009-06-26 00:00:00 115942928608
#> 2009-06-24 00:00:00           NA
#> 2009-06-23 00:00:00           NA
#> 2009-06-21 00:00:00           NA
#> 2009-06-20 00:00:00           NA
#> 2009-06-19 00:00:00           NA
#> 2009-06-17 00:00:00           NA

df %>%
  mutate(ELE.CN = vec_fill_missing(ELE.CN, max_fill = 3))
#>                DATE       ELE.CN
#> 2009-06-30 00:00:00 115942928608
#> 2009-06-28 00:00:00 115942928608
#> 2009-06-27 00:00:00 115942928608
#> 2009-06-26 00:00:00 115942928608
#> 2009-06-24 00:00:00 115942928608
#> 2009-06-23 00:00:00 115942928608
#> 2009-06-21 00:00:00 115942928608
#> 2009-06-20 00:00:00           NA
#> 2009-06-19 00:00:00           NA
#> 2009-06-17 00:00:00           NA

Created on 2022-07-14 by the reprex package (v2.0.1)

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
2

Here is another suggestion, that will work on your example but could fail for a generalized approach (or must be tweaked):

library(dplyr)

df %>% 
  group_by(group_id = cumsum(is.na(ELE.CN))) %>% 
  ungroup() %>% 
  mutate(ELE.CN = ifelse(is.na(ELE.CN) & 
                           (group_id >= 0 & group_id <=30), 
                         first(ELE.CN), ELE.CN), .keep="unused") 
   DATE                      ELE.CN
   <chr>                      <dbl>
 1 2009-06-30 00:00:00 115942928608
 2 2009-06-28 00:00:00 115942928608
 3 2009-06-27 00:00:00 115942928608
 4 2009-06-26 00:00:00 115942928608
 5 2009-06-24 00:00:00 115942928608
 6 2009-06-23 00:00:00 115942928608
 7 2009-06-21 00:00:00 115942928608
 8 2009-06-20 00:00:00 115942928608
 9 2009-06-19 00:00:00 115942928608
10 2009-06-17 00:00:00 115942928608
TarJae
  • 72,363
  • 6
  • 19
  • 66