-1

I have been unsuccessful in replicating the code found here:

count the number of days between two dates per year

Note: Updated per feedback:

Data I have:

ID Startdt Enddt
60A 5/4/2018 1/10/2022
60B 2/4/2019 12/20/2022
60C 8/22/2015 6/20/2020

Data I want: so for ID: 60A

| ID | 2018 | 2019 |2022
|60A |242|365|9

I get an error in autocopy Error in auto_copy(): ! x and y must share the same src.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • 1
    (1) You reference `df` (which is a base R function) and `data` (which is also a base R function), I suspect one of them it a typo. This is one reason it may be disadvantageous to use names of common functions as variable names. (2) It may also be due to `vars(date)` if `date` is not a variable within your frame. (3) `mutate_at` has been superseded by `mutate(across(...))` (over three years ago), I suggest you go through the docs to adapt to the new (much more powerful) methods. (4) This is difficult to help further with lacking a [reproducible dataset](https://stackoverflow.com/q/5963269). – r2evans Jun 08 '23 at 17:15
  • Please isolate the part of the code that produces the error. Chances are you'll be able to fix it once you found the culprit. – I_O Jun 08 '23 at 17:16
  • (5) `data %>% mutate_at(vars(date), as.Date, format="%m-%d-%Y")` is storing nothing, so that change is not preserved. – r2evans Jun 08 '23 at 17:16
  • @r2evans: I used df and data for simplicity. But, you are right they are base R functions. – lilshortstop Jun 08 '23 at 17:23
  • @r2evans: should I create a new dataset? – lilshortstop Jun 08 '23 at 17:24
  • 1
    There are three things that would be useful here: (1) provide us a representative dataset, whether it's 5 rows or 15, 3 columns or 10. Please only give just what is needed to reproduce the problem, we don't need gallons of data. (2) Please run the code and show the errors _using the sample data_. If it is a different error from your real data, then perhaps the sample is not representative enough. (3) As I_O suggested, it would be informative (and reductive within the question) to show the code up until the error, nothing beyond that likely matters (yet). – r2evans Jun 08 '23 at 17:33
  • When I run the code [in this answer](https://stackoverflow.com/a/67078532/903061) on your data, if I add the line to the first mutate `across(ends_with("dt"), mdy)` to convert your character strings to `Date` class and fix the column names to match your data, it all runs fine. Could you please show the code you are running that doesn't work? – Gregor Thomas Jun 08 '23 at 18:10
  • @GregorThomas: please forgive me but what do you mean by add the line to the first mutate. – lilshortstop Jun 08 '23 at 18:29
  • @GregorThomas: is this application of your first line correct: sample2 <- sample%>% mutate(across(ends_with(end),mdy), – lilshortstop Jun 08 '23 at 18:36
  • No, `sample %>% mutate(across(ends_with("dt"), mdy), <> date_int = interval(Startdt, Enddt), ...`. I use `ends_with("dt")` because in the data you show the two columns that should be `Date` class are named `"Startdt"` and `"Enddt"`, and a convenient way to identify them is both of their names **end with "dt"**, hence the code `ends_with("dt")`. – Gregor Thomas Jun 08 '23 at 18:39
  • @GregorThomas: This is superhelpful feedback. I get an unmatched opening bracket error using the mutate line. – lilshortstop Jun 08 '23 at 18:52

1 Answers1

1
df = read.table(text = 'ID  Startdt     Enddt
60A     5/4/2018    1/10/2022
60B     2/4/2019    12/20/2022
60C     8/22/2015   6/20/2020', header = T)

library(dplyr)
library(lubridate)
library(purrr)

df %>%
  mutate(
    across(ends_with("dt"), mdy), ## added this line to convert to date class
    date_int = interval(Startdt, Enddt),
         year = map2(year(Startdt), year(Enddt), seq)
  ) %>%
  unnest(year) %>%
  mutate(
    year_int = interval(as.Date(paste0(year, '-01-01')),
      as.Date(paste0(year, '-12-31'))
    ),
    year_sect = intersect(date_int, year_int),
    start_new = as.Date(int_start(year_sect)),
    end_new = as.Date(int_end(year_sect))
  ) %>%
  select(ID, start_new, end_new) %>%
  mutate(
    year = year(start_new),
    days = as.numeric(end_new - start_new) + 1 ## added 1 here as a correction
  ) %>%
  right_join(df, by = "ID") %>%
  pivot_wider(
    id_cols = c(ID, Startdt, Enddt),
    names_from = year, values_from = days,
    names_prefix = "year_", 
    values_fill = list(days = 0)
  ) %>%
  mutate(days_number = rowSums(across(starts_with("year")))) ## updated this line to use `across()`
# # A tibble: 3 × 12
#   ID    Startdt   Enddt      year_2018 year_2019 year_2020 year_2021 year_2022 year_2015 year_2016 year_2017
#   <chr> <chr>     <chr>          <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
# 1 60A   5/4/2018  1/10/2022        242       365       366       365        10         0         0         0
# 2 60B   2/4/2019  12/20/2022         0       331       366       365       354         0         0         0
# 3 60C   8/22/2015 6/20/2020        365       365       172         0         0       132       366       365
# # ℹ 1 more variable: days_number <dbl>
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thank you. I get this error message: Error in `mutate()`: ℹ In argument: `year = map2(year(Startdt), year(Enddt), seq)`. Caused by error in `map2()`: ℹ In index: 1. Caused by error in `seq.default()`: ! 'from' must be a finite number Run `rlang::last_trace()` to see where the error occurred. – lilshortstop Jun 08 '23 at 19:08
  • Do you get that error on the sample data as imported at the top of my answer? If so, maybe try updating your packages. If not, try to find some sample data that reproduces the problem. – Gregor Thomas Jun 08 '23 at 19:09
  • Also, run `rlang::last_trace()` to see if you can get more details on the error. – Gregor Thomas Jun 08 '23 at 19:10
  • No, I do not get the error on the sample data. The code works beautifully on the sample data. – lilshortstop Jun 08 '23 at 19:16
  • The issue comes in when I try to apply it to the full dataset. – lilshortstop Jun 08 '23 at 19:16
  • after running rlang::last_trace(). I get └─base::seq.default(.x[[i]], .y[[i]], ...) 23. └─base::stop("'from' must be a finite number") – lilshortstop Jun 08 '23 at 19:18
  • Try to find some sample data that reproduces the problem. You can run the code on the first half of your data, if that works, then try the second half. If it doesn't work, try the first quarter, etc. Isolate the problem. See if you can find out what's different on the rows where it doesn't work. If you can't figure it out, post the smallest subset of data you can find that demonstrates the problem. – Gregor Thomas Jun 08 '23 at 19:44
  • Some things to look for based on the error message: do you have undefined dates? Missing values? Do you have other columns that end with `"dt"` that shouldn't be included? Do you have dates in an inconsistent format that aren't being converted correctly?... – Gregor Thomas Jun 08 '23 at 19:45