*edit: I have provided dput() sample data at the bottom of page.
I have a large dataframe that I am trying to expand. Specifically, I want to use the start and end date columns to produce a row for every month each person appears in the dataset. I also want to impute the values for the other columns in the expanded dataframe. Using the two segments of code below, I was able to produce the expanded dataframe I wanted. Essentially, I want to turn this:
Name | startdate | enddate | score |
---|---|---|---|
John | 1970-01-01 | 1970-12-31 | 133 |
Tim | 1970-08-15 | 1970-12-31 | 184 |
into this:
Name | score | month |
---|---|---|
John | 133 | 1970-01-01 |
John | 133 | 1970-02-01 |
John | 133 | 1970-03-01 |
John | 133 | 1970-04-01 |
John | 133 | 1970-05-01 |
John | 133 | 1970-06-01 |
John | 133 | 1970-07-01 |
John | 133 | 1970-08-01 |
John | 133 | 1970-09-01 |
John | 133 | 1970-10-01 |
John | 133 | 1970-11-01 |
John | 133 | 1970-12-01 |
Tim | 184 | 1970-08-01 |
Tim | 184 | 1970-09-01 |
Tim | 184 | 1970-10-01 |
Tim | 184 | 1970-11-01 |
Tim | 184 | 1970-12-01 |
dwnom$startdate <- as.Date(dwnom$startdate, "%Y-%m-%d")
dwnom$enddate <- as.Date(dwnom$enddate, "%Y-%m-%d")
M <- Map(seq, dwnom$startdate, dwnom$enddate, by = "month")
Now, this code ran fine a few months ago, without any problems. However, I am now coming up against an error that reads Error in seq.int(r1$mon, 12 * (to0$year - r1$year) + to0$mon, by) : 'from' must be a finite number
. I've looked at other responses to this specific error, and most suggested that there might be something wrong with the date format. So I added the "%Y-%m-%d"
to the as.Date
commands to specify, and I'm still getting the same error.
This is where I am stuck. Once I get this line to work, I plan to run the rest of the code below to produce the expanded dataframe I want.
dwnom$startdate <- as.character(dwnom$startdate)
dwnom$enddate <- as.character(dwnom$enddate)
dwnom <- data.frame(
name = rep.int(dwnom$Name, vapply(M, length, 1L)),
score = rep.int(dwnom$score, vapply(M, length, 1L)),
month = do.call(c, M)
)
However, I have little experience in this, so there may be a much easier way to accomplish what I would like to, and I'm open to changing my approach, if that will help me get to the table I want. Thanks in advance for your help.
dput data:
structure(list(name = c("bonner_josiah", "rogers_michael"
), score = c(0.671084337349397, 0.666867469879518), startdate = c("2009-01-01",
"2009-01-01"), enddate = c("2010-12-31", "2010-12-31")), row.names = 1:2, class = "data.frame")