0

I'm using the tidyr complete() and fill() functions to extend (copy down) a dataframe so all ID elements have the same number of rows. The code posted at the bottom correctly extends all fields, with the exception of the "Bal2" column of the dataframe where a series of NA's should be extended. Any recommendations for how to correct this?

The NA values do serve a calculation purpose in the fuller code this is deployed in. Also please note that I have another code snippet for correctly extending the "Period_2" column so I don't need help with "Period_2". It's been omitted for code brevity.

The below illustrates the issue when generating the testDF and testDF1 dataframes:

enter image description here

Code:

library(dplyr)
library(tidyr)

testDF <-
  data.frame(
    ID = c(rep(1,5),rep(50,3),rep(60,3)),
    Period_1 = c(1:5,1:3,1:3),
    Period_2 = c("2012-06","2012-07","2012-08","2012-09","2012-10","2013-06","2013-07","2013-08","2012-10","2012-11","2012-12"),
    Bal1 = c(rep(10,5),21:23,36:34),
    Bal2 = c(rep(12,8),rep(NA,3))
  )

testDF1 <- testDF %>%
  tidyr::complete(ID, nesting(Period_1)) %>%
  tidyr::fill(Bal1, Bal2, .direction = "down")
testDF1 <- as.data.frame(testDF1)
Village.Idyot
  • 1,359
  • 2
  • 8
  • 2
    Try a `group_by(ID)` before `fill`. – stefan Nov 23 '22 at 13:31
  • Does this answer your question? [R replace NA with last value for group ID ordered by date](https://stackoverflow.com/questions/71133716/r-replace-na-with-last-value-for-group-id-ordered-by-date) – Captain Hat Nov 23 '22 at 13:31
  • 2
    @stefan I was just about to post the same (as an answer), but you just beat me to it. It's your answer to post. – r2evans Nov 23 '22 at 13:32
  • 2
    Thx @r2evans. Always feel free to add an answer even if I posted as comment. When an issue can be fixed easily, one always wavers to add an answer. – stefan Nov 23 '22 at 14:05

2 Answers2

1

As mentioned in the comments, group by the ID and this should be resolved:

library(dplyr)
# library(tidyr)
testDF %>%
  tidyr::complete(ID, tidyr::nesting(Period_1)) %>%
  group_by(ID) %>%
  tidyr::fill(Bal1, Bal2, .direction = "down") %>%
  ungroup()
# # A tibble: 15 x 5
#       ID Period_1 Period_2  Bal1  Bal2
#    <dbl>    <int> <chr>    <dbl> <dbl>
#  1     1        1 2012-06     10    12
#  2     1        2 2012-07     10    12
#  3     1        3 2012-08     10    12
#  4     1        4 2012-09     10    12
#  5     1        5 2012-10     10    12
#  6    50        1 2013-06     21    12
#  7    50        2 2013-07     22    12
#  8    50        3 2013-08     23    12
#  9    50        4 NA          23    12
# 10    50        5 NA          23    12
# 11    60        1 2012-10     36    NA
# 12    60        2 2012-11     35    NA
# 13    60        3 2012-12     34    NA
# 14    60        4 NA          34    NA
# 15    60        5 NA          34    NA
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Ahh yes! I see you commented out `library(tidyr)`. Is it not necessary when the package is cited as in `tidyr::complete(...)`? – Village.Idyot Nov 23 '22 at 15:08
  • 1
    Correct: `tidyr::` will always work whether you `library(tidyr)` or not. The use of `::` for package-specific functions is sometimes a personal preference, occasionally (i.e., in packages) a necessity when dealing with multiple same-name functions (e.g., `stats::filter`/`dplyr::filter`, or `dplyr::summarize`/`plyr::summarize`, or many others), and you want absolutely clarity on which is being used (regardless of what the user may have loaded at the time ... and just as importantly, whatever _order_ the packages are loaded, as that is also relevant. – r2evans Nov 23 '22 at 15:22
  • 1
    The reason I included it in this answer as commented-out code is to identify that it is needed to execute the code but does not necessarily need to be loaded into the current namespace. I chose that path because your question used `tidyr::` in most places (but not all, `nesting` was not `::`-specified, which for consistency it should be). – r2evans Nov 23 '22 at 15:23
0

Use dplry::group_by() before you fill():

require(dplyr)
#> Loading required package: dplyr
require(tidyr)
#> Loading required package: tidyr

test <- tribble(
  ~id,    ~value,
  "A",    80,
  "A",    NA,
  "A",    NA,
  "B",    NA,
  "B",    NA
)

fill(test, value)
#> # A tibble: 5 × 2
#>   id    value
#>   <chr> <dbl>
#> 1 A        80
#> 2 A        80
#> 3 A        80
#> 4 B        80
#> 5 B        80

test <- group_by(test, id)
fill(test, value)
#> # A tibble: 5 × 2
#> # Groups:   id [2]
#>   id    value
#>   <chr> <dbl>
#> 1 A        80
#> 2 A        80
#> 3 A        80
#> 4 B        NA
#> 5 B        NA

Created on 2022-11-23 with reprex v2.0.2

Captain Hat
  • 2,444
  • 1
  • 14
  • 31
  • 3
    You should almost always use `library`, not `require`. The latter never stops following code when the package is not available, which is almost never what is intended. Refs: https://stackoverflow.com/a/51263513 – r2evans Nov 23 '22 at 14:16