0

*edit: I have provided dput() sample data at the bottom of page.

I have a large dataframe that I am trying to expand. Specifically, I want to use the start and end date columns to produce a row for every month each person appears in the dataset. I also want to impute the values for the other columns in the expanded dataframe. Using the two segments of code below, I was able to produce the expanded dataframe I wanted. Essentially, I want to turn this:

Name startdate enddate score
John 1970-01-01 1970-12-31 133
Tim 1970-08-15 1970-12-31 184

into this:

Name score month
John 133 1970-01-01
John 133 1970-02-01
John 133 1970-03-01
John 133 1970-04-01
John 133 1970-05-01
John 133 1970-06-01
John 133 1970-07-01
John 133 1970-08-01
John 133 1970-09-01
John 133 1970-10-01
John 133 1970-11-01
John 133 1970-12-01
Tim 184 1970-08-01
Tim 184 1970-09-01
Tim 184 1970-10-01
Tim 184 1970-11-01
Tim 184 1970-12-01
dwnom$startdate <- as.Date(dwnom$startdate, "%Y-%m-%d")
dwnom$enddate <- as.Date(dwnom$enddate, "%Y-%m-%d")

M <- Map(seq, dwnom$startdate, dwnom$enddate, by = "month")

Now, this code ran fine a few months ago, without any problems. However, I am now coming up against an error that reads Error in seq.int(r1$mon, 12 * (to0$year - r1$year) + to0$mon, by) : 'from' must be a finite number. I've looked at other responses to this specific error, and most suggested that there might be something wrong with the date format. So I added the "%Y-%m-%d" to the as.Date commands to specify, and I'm still getting the same error.

This is where I am stuck. Once I get this line to work, I plan to run the rest of the code below to produce the expanded dataframe I want.

dwnom$startdate <- as.character(dwnom$startdate)
dwnom$enddate <- as.character(dwnom$enddate)

dwnom <- data.frame(
    name = rep.int(dwnom$Name, vapply(M, length, 1L)),
    score = rep.int(dwnom$score, vapply(M, length, 1L)),
    month = do.call(c, M)
)

However, I have little experience in this, so there may be a much easier way to accomplish what I would like to, and I'm open to changing my approach, if that will help me get to the table I want. Thanks in advance for your help.

dput data:

structure(list(name = c("bonner_josiah", "rogers_michael"
), score = c(0.671084337349397, 0.666867469879518), startdate = c("2009-01-01", 
"2009-01-01"), enddate = c("2010-12-31", "2010-12-31")), row.names = 1:2, class = "data.frame")
lwe
  • 323
  • 1
  • 8
  • I cannot reproduce the error with the sample data you provided. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be used to test and verify possible solutions. Share your data with a `dput()` so we can easily copy/paste into R for testing. What version of R are you using? – MrFlick Nov 29 '22 at 14:18
  • Thanks for reminding me, @MrFlick. I have added some sample data at the bottom. – lwe Nov 29 '22 at 14:30
  • I still can't replicate the error. Do you get the error about `seq.int` with the sample `dput()` you provided? If so, what are all the packages that you have loaded when this happens? – MrFlick Nov 29 '22 at 14:32
  • Thanks for your help, @MrFlick. Turns out i had some NA values in the dates columns. Those were causing the error. – lwe Nov 29 '22 at 15:19

1 Answers1

0
dwnom <- structure(list(name = c("bonner_josiah", "rogers_michael"),
      score = c(0.671084337349397, 0.666867469879518), 
      startdate = c(  "2009-01-01",  "2009-01-01"), 
      enddate = c("2010-12-31", "2010-12-31")),
      row.names = 1:2, class = "data.frame")

library(tidyverse)
(result <- mutate(rowwise(dwnom),
  across(ends_with("date"), ~ as.Date(.x, "%Y-%m-%d")),
  month = list(seq(startdate, enddate, by = "month"))
) |>  unnest_longer(col = month))
Nir Graham
  • 2,567
  • 2
  • 6
  • 10