0

I have a table with lots of report gaps (missing report dates). I want to fix the data by adding the missing report dates onto each account (represented by ID).

For example, let's set the start date as 2014-07-01 and the end date as 2018-03-13. There are 970 report dates excluding the weekends in total.

The table looks like below...Let's call it "df". My original df is the data frame shown below without rows beginning with GAP. Rows starting with GAP are rows I want to add.

for step 1, I want to add all the missing report dates onto this table (rows starting with GAP below), and then for step 2, I want to fill in all the "balances" I added from step 1 with 0. And for step 3, I want to fill in other variables with the last value before the gap. And for the last step, I want to apply it to all IDs.

                     ID      report_date    balance    other_v1    other_v2
DAY 1                 1       7/1/2014        1500         0
DAY 2                 1       7/2/2014        1500        0.1
GAP 1 (DAY 3)         1       7/3/2014          0         0.1
GAP 2 (DAY 4)         1        ...              0         0.1
...                   1        ...              0...      0.1..
GAP 30 (DAY 32)       1       8/10/2014         0         0.1
DAY 33                1       8/11/2014       1500        0.3
DAY 34                1       8/12/2014       1600        0.3
DAY 35                1       8/13/2014       1600        0.25
GAP 31 (DAY 36)       1       8/14/2014         0         0.25
GAP 32 (DAY 37)       1       8/15/2014         0         0.25
...                   1       ...               0         ...
DAY 970 (OR GAP N)    1       3/13/2018         0         ...
lmo
  • 37,904
  • 9
  • 56
  • 69
NightDog
  • 91
  • 7
  • 1
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Avoid unusable examples with "..." because that's difficult to test with. – MrFlick May 10 '18 at 19:23

1 Answers1

0

For step 1 could create a new data base with all the values you want, like this:

df = data.frame(as.Date("07/01/2014", "%m/%d/%Y"):as.Date("03/13/2018", "%m/%d/%Y"))
names(df) = "date"

df$date = as.Date(df$date, origin = '1970-01-01')
df$weekday = weekdays.Date(df$date)

df = df[-which(df$weekday=="saturday"),]
df = df[-which(df$weekday=="sunday"),]

Then you just need to use cbind(your_db, df), and you have your dates column! For step 2 and 3 I would need reproducible data.. If you could give us a reproducible example it wold be easier and faster to help you! :)

Giovana Stein
  • 451
  • 3
  • 13