0

I have a data of random dates from 2008 to 2020 and their corresponding value

Date                    Val
September 16, 2012       32
September 19, 2014       33
January 05, 2008         26
June 07, 2017            02
December 15, 2019        03
May 28, 2020             18

I want to fill the missing dates from January 01 2008 to March 31, 2020 and their corresponding value as 1.

I refer some of the post like Post1, Post2 and I am not able to solve the problem based on that. I am a beginner in R.

I am looking for data like this

 Date                    Val
 January 01, 2008        1
 January 02, 2008        1
 January 03, 2008        1
 January 04, 2008        1
 January 05, 2008       26
 ........
djMohit
  • 151
  • 1
  • 10

2 Answers2

1

Use tidyr::complete :

library(dplyr)

df %>%
  mutate(Date = as.Date(Date, "%B %d, %Y")) %>%
  tidyr::complete(Date = seq(as.Date('2008-01-01'), as.Date('2020-03-31'), 
                           by = 'day'), fill = list(Val = 1)) %>%
  mutate(Date = format(Date, "%B %d, %Y"))


# A tibble: 4,475 x 2
#   Date               Val
#   <chr>            <dbl>
# 1 January 01, 2008     1
# 2 January 02, 2008     1
# 3 January 03, 2008     1
# 4 January 04, 2008     1
# 5 January 05, 2008    26
# 6 January 06, 2008     1
# 7 January 07, 2008     1
# 8 January 08, 2008     1
# 9 January 09, 2008     1
#10 January 10, 2008     1
# … with 4,465 more rows

data

df <- structure(list(Date = c("September 16, 2012", "September 19, 2014", 
"January 05, 2008", "June 07, 2017", "December 15, 2019", "May 28, 2020"
), Val = c(32L, 33L, 26L, 2L, 3L, 18L)), class = "data.frame", 
row.names = c(NA, -6L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

We can create data frame with the desired date range and then join our data frame on it and replace all NAs with 1:

library(tidyverse)
days_seq %>% 
  left_join(df) %>% 
  mutate(Val = if_else(is.na(Val), as.integer(1), Val))

Joining, by = "Date"
# A tibble: 4,474 x 2
   Date         Val
   <date>     <int>
 1 2008-01-01     1
 2 2008-01-02     1
 3 2008-01-03     1
 4 2008-01-04     1
 5 2008-01-05    33
 6 2008-01-06     1
 7 2008-01-07     1
 8 2008-01-08     1
 9 2008-01-09     1
10 2008-01-10     1
# ... with 4,464 more rows

Data

days_seq <- tibble(Date = seq(as.Date("2008/01/01"), as.Date("2020/03/31"), "days"))

df <- tibble::tribble(
                   ~Date, ~Val,
        "2012/09/16",  32L,
        "2012/09/19",  33L,
        "2008/01/05",  33L
        ) 
df$Date <- as.Date(df$Date)
Ahorn
  • 3,686
  • 1
  • 10
  • 17