0

I want to calculate the difference between two dates as a dependent variable in my regression model. The contents of the two dates are stored in separate columns - one each for year, month, and day. And those variables are classified as numeric. My attempt to make this work includes tidying the data by removing all NA's and then classifying the variables as dates:

```{r}
movies2 <- na.omit(movies)
theater_year <- as_date(movies2$thtr_rel_year) 
theater_month <- as_date(movies2$thtr_rel_month) 
theater_day <- as_date(movies2$thtr_rel_day)

dvd_year <- as_date(movies2$dvd_rel_year) 
dvd_month <- as_date(movies2$dvd_rel_month) 
dvd_day <- as_date(movies2$dvd_rel_day) 
```

Then to create a new column in my data set that takes the difference between the two dates:

```{r}
moviesclean <- within(movies2, {datediff <- c(dvd_year, dvd_month,      dvd_day)  - c(theater_year, theater_month, theater_day)})
```

This generates the message: replacement element 1 has 1857 rows to replace 619 rows

When I set up and run my regression model:

```{r}
model1 <- lm(datediff ~ genre + title_type + critics_score + imdb_rating +   best_pic_nom + best_pic_win, data = movies2)
```

I receive the following error: Error in model.frame.default(formula = datediff ~ genre + title_type +: variable lengths differ (found for 'genre')

It appears I am tripling the length of the new column because I'm adding three columns together. How do I avoid this?

K.Dᴀᴠɪs
  • 9,945
  • 11
  • 33
  • 43
Doug T.
  • 1
  • 1
  • Please make this question [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). As it stands, we have a verbal description of what the columns look like, but that too often is insufficient to be able to help much. For instance, how can you derive a date from just a year or just a month? I would expect `as_date` (please confirm this is from `lubridate`) would not do what you think/expect given just "1999" (`"1975-06-23"`, if you're curious). – r2evans Mar 09 '18 at 17:19
  • As mentioned above, dates must have a year, month and day. Probably you will need to `paste()` those elements together into a single character vector and _then_ convert that vector to a date. – joran Mar 09 '18 at 17:29

0 Answers0