0

I want to complete a df in R when in it miss a month date for example if I have one year of information by months and days like this one.

df = data.frame(Date = c("2020-01-01","2020-02-01",
"2020-03-02","2020-04-01","2020-09-01","2020-10-01",
"2020-11-01","2020-12-01"))

When I use the function complete, I use it like this

 df = df%>%
  mutate(Date = as.Date(Date)) %>%
  complete(Date= seq.Date("2020-01-01", "2020-12-31", by = "month"))

And the problem is that my final df complete all the dates like may, june, july and that is ok but also complete march because march doesn't have the first day and begings in 2020-03-02.

 df = data.frame(Date = c("2020-01-01","2020-02-01",
"2020-03-01","2020-03-02","2020-04-01","2020-05-01",
"2020-06-01","2020-07-01","2020-08-01","2020-09-01",
"2020-10-01","2020-11-01","2020-12-01"))

Do you know how to complete df only if the df doesn't have any date of a month?

In my case I don't want to complete march because march has a date already.

Thanks a lot.

Exthelion
  • 63
  • 6

2 Answers2

2

A possible solution would be completing only by yearmon from the zoo package, so that it the actual day of the month is irrelevant.

library(dplyr) 
library(zoo) # for as.yearmon
library(tidyr) # for complete

df <- data.frame(Date = c("2020-01-01","2020-02-01",
                          "2020-03-02","2020-04-01",
                          "2020-09-01","2020-10-01",
                          "2020-11-01","2020-12-01"), 
                 id = 1:8)
df
#>         Date id
#> 1 2020-01-01  1
#> 2 2020-02-01  2
#> 3 2020-03-02  3
#> 4 2020-04-01  4
#> 5 2020-09-01  5
#> 6 2020-10-01  6
#> 7 2020-11-01  7
#> 8 2020-12-01  8

df %>% 
  mutate(Date = as.Date(Date), 
         year_mon = as.yearmon(Date)) %>%
  complete(
    year_mon = seq.Date(as.Date("2020-01-01"),
                    as.Date("2020-12-31"),
                    by = "month") %>% as.yearmon()
  )
#> # A tibble: 12 x 3
#>    year_mon  Date          id
#>    <yearmon> <date>     <int>
#>  1 Jan 2020  2020-01-01     1
#>  2 Feb 2020  2020-02-01     2
#>  3 Mar 2020  2020-03-02     3
#>  4 Apr 2020  2020-04-01     4
#>  5 May 2020  NA            NA
#>  6 Jun 2020  NA            NA
#>  7 Jul 2020  NA            NA
#>  8 Aug 2020  NA            NA
#>  9 Sep 2020  2020-09-01     5
#> 10 Oct 2020  2020-10-01     6
#> 11 Nov 2020  2020-11-01     7
#> 12 Dec 2020  2020-12-01     8

Created on 2021-06-25 by the reprex package (v2.0.0)

Marcelo Avila
  • 2,314
  • 1
  • 14
  • 22
2

You can extract year and month value from the Date and use complete on that.

library(dplyr)
library(lubridate)
library(tidyr)

df %>% 
  mutate(Date = as.Date(Date), 
         year = year(Date), 
         month = month(Date)) %>%
  complete(year, month = 1:12) %>%
  mutate(Date = if_else(is.na(Date), 
                        as.Date(paste(year, month, 1, sep = '-')), Date)) %>%
  select(Date)

#    Date      
#   <date>    
# 1 2020-01-01
# 2 2020-02-01
# 3 2020-03-02
# 4 2020-04-01
# 5 2020-05-01
# 6 2020-06-01
# 7 2020-07-01
# 8 2020-08-01
# 9 2020-09-01
#10 2020-10-01
#11 2020-11-01
#12 2020-12-01
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213