calculate the time interval in different year

Question

I have a dataset with IN_TIME_date, OUT_TIME_date, the date during 2012-2018, I want to calculate every year's patient day in hospital, However, some patients' IN_TIME_date, OUT_TIME_date not in same year. How can I calculate that? Thank you very much.

enter image description here

> dput(as.data.frame(Demography_newdata0129)) structure(list(CASEID = c("023252(1)", "07597558(2)", "07597558(3)", "100520(31)", "100520(32)", "100520(33)", "100520(34)", "10056(1)", "101171(4)", "101171(5)", "101455(2)", "101557(2)", "101571(3)", "101571(4)", "101571(5)", "101571(6)", "10160(5)", "101637(2)", "101893(13)", "101893(15)", "101893(16)", "102807(4)", "102807(5)", "102862(12)"), IN_TIME_date = c("2017-02-25", "2015-10-23", "2016-07-06", "2013-01-23", "2013-03-12", "2013-06-13", "2013-10-08", "2016-02-20", "2015-09-24", "2015-10-19", "2014-05-01", "2015-12-11", "2014-08-26", "2015-07-21", "2016-01-06", "2017-03-20", "2014-04-14", "2017-04-25", "2014-08-10", "2017-02-06", "2017-04-12", "2016-01-19", "2016-06-08", "2012-10-19"), OUT_TIME_date = c("2017-03-02", "2015-12-05", "2016-07-15", "2013-01-28", "2013-03-18", "2013-06-18", "2013-10-15", "2016-02-29", "2015-10-19", "2015-11-02", "2014-05-28", "2016-01-15", "2015-07-21", "2016-01-06", "43179", "2017-12-14", "2014-06-14", "2017-05-09", "2014-08-21", "2017-02-11", "2017-04-20", "2016-01-24", "2016-06-15", "2013-01-25"), LOS = c(5, 43, 9, 5, 6, 5, 7, 9, 25, 14, 27, 35, 329, 169, 804, 269, 61, 14, 11, 5, 8, 5, 7, 98 ),2012= c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 73),2013= c(0, 0, 0, 5, 6, 5, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25),2014= c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 127, 0, 0, 0, 61, 0, 11, 0, 0, 0, 0, 0),2015= c(0, 43, 0, 0, 0, 0, 0, 0, 25, 14, 0, 20, 201, 163, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),2016= c(0, 0, 9, 0, 0, 0, 0, 9, 0, 0, 0, 15, 0, 5, 360, 0, 0, 0, 0, 0, 0, 5, 7, 0),2017= c(5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 365, 269, 0, 14, 0, 5, 8, 0, 0, 0),2018= c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -24L), class = "data.frame")

Welcome to SO. Please review [how to ask](https://stackoverflow.com/help/how-to-ask) questions, and then provide a [minimal reproducible example/attempt](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including sample data. PS. Screenshots are never a good idea, as we can't copy&paste data. — Maurits Evers, Jan 29 '19 at 10:58

Maurits Evers · Accepted Answer · 2019-01-30T13:20:12.027

First off, two comments:

Your dput throws errors. Firstly, you are missing single ticks for the date column names; secondly there seems to be an error for OUT_TIME_date of CASEID = 101571(5). For future posts, please double-check that sample data (1) doesn't contain R syntax errors, and (2) is correct & representative.
There are also mistakes in your expected output, where you don't include the last day for stays that extend past one year.

Here is one option making use of lubridate functions to deal with Dates

library(tidyverse)
library(lubridate)
df %>%
    mutate_at(vars(ends_with("date")), ~as.Date(.x, format = "%Y-%m-%d")) %>%
    mutate(
        LOS = difftime(OUT_TIME_date, IN_TIME_date, units = "days"),
        grp = map2(
            IN_TIME_date,
            OUT_TIME_date,
            ~year(seq.Date(.x, .y - 1, by = "day")))) %>%
    unnest() %>%
    group_by_all() %>%
    tally() %>%
    spread(grp, n, fill = 0) %>%
    ungroup() %>%
    as.data.frame()
#        CASEID IN_TIME_date OUT_TIME_date      LOS 2012 2013 2014 2015 2016
#1    023252(1)   2017-02-25    2017-03-02   5 days    0    0    0    0    0
#2  07597558(2)   2015-10-23    2015-12-05  43 days    0    0    0   43    0
#3  07597558(3)   2016-07-06    2016-07-15   9 days    0    0    0    0    9
#4   100520(31)   2013-01-23    2013-01-28   5 days    0    5    0    0    0
#5   100520(32)   2013-03-12    2013-03-18   6 days    0    6    0    0    0
#6   100520(33)   2013-06-13    2013-06-18   5 days    0    5    0    0    0
#7   100520(34)   2013-10-08    2013-10-15   7 days    0    7    0    0    0
#8     10056(1)   2016-02-20    2016-02-29   9 days    0    0    0    0    9
#9    101171(4)   2015-09-24    2015-10-19  25 days    0    0    0   25    0
#10   101171(5)   2015-10-19    2015-11-02  14 days    0    0    0   14    0
#11   101455(2)   2014-05-01    2014-05-28  27 days    0    0   27    0    0
#12   101557(2)   2015-12-11    2016-01-15  35 days    0    0    0   21   14
#13   101571(3)   2014-08-26    2015-07-21 329 days    0    0  128  201    0
#14   101571(4)   2015-07-21    2016-01-06 169 days    0    0    0  164    5
#15   101571(5)   2016-01-06    2018-03-20 804 days    0    0    0    0  361
#16   101571(6)   2017-03-20    2017-12-14 269 days    0    0    0    0    0
#17    10160(5)   2014-04-14    2014-06-14  61 days    0    0   61    0    0
#18   101637(2)   2017-04-25    2017-05-09  14 days    0    0    0    0    0
#19  101893(13)   2014-08-10    2014-08-21  11 days    0    0   11    0    0
#20  101893(15)   2017-02-06    2017-02-11   5 days    0    0    0    0    0
#21  101893(16)   2017-04-12    2017-04-20   8 days    0    0    0    0    0
#22   102807(4)   2016-01-19    2016-01-24   5 days    0    0    0    0    5
#23   102807(5)   2016-06-08    2016-06-15   7 days    0    0    0    0    7
#24  102862(12)   2012-10-19    2013-01-25  98 days   74   24    0    0    0
#   2017 2018
#1     5    0
#2     0    0
#3     0    0
#4     0    0
#5     0    0
#6     0    0
#7     0    0
#8     0    0
#9     0    0
#10    0    0
#11    0    0
#12    0    0
#13    0    0
#14    0    0
#15  365   78
#16  269    0
#17    0    0
#18   14    0
#19    0    0
#20    5    0
#21    8    0
#22    0    0
#23    0    0
#24    0    0

The idea is to generate a day-by-day sequence of dates based on IN_TIME_date and OUT_TIME_date; we then extract only the years from those sequences and tally the number of years per CASEID. The rest is basic tidying/reshaping of your data.

df <- structure(list(CASEID = c("023252(1)", "07597558(2)", "07597558(3)",
"100520(31)", "100520(32)", "100520(33)", "100520(34)", "10056(1)",
"101171(4)", "101171(5)", "101455(2)", "101557(2)", "101571(3)",
"101571(4)", "101571(5)", "101571(6)", "10160(5)", "101637(2)",
"101893(13)", "101893(15)", "101893(16)", "102807(4)", "102807(5)",
"102862(12)"), IN_TIME_date = c("2017-02-25", "2015-10-23", "2016-07-06",
"2013-01-23", "2013-03-12", "2013-06-13", "2013-10-08", "2016-02-20",
"2015-09-24", "2015-10-19", "2014-05-01", "2015-12-11", "2014-08-26",
"2015-07-21", "2016-01-06", "2017-03-20", "2014-04-14", "2017-04-25",
"2014-08-10", "2017-02-06", "2017-04-12", "2016-01-19", "2016-06-08",
"2012-10-19"), OUT_TIME_date = c("2017-03-02", "2015-12-05",
"2016-07-15", "2013-01-28", "2013-03-18", "2013-06-18", "2013-10-15",
"2016-02-29", "2015-10-19", "2015-11-02", "2014-05-28", "2016-01-15",
"2015-07-21", "2016-01-06", "2018-03-20", "2017-12-14", "2014-06-14",
"2017-05-09", "2014-08-21", "2017-02-11", "2017-04-20", "2016-01-24",
"2016-06-15", "2013-01-25")), row.names = c(NA, -24L), class = "data.frame")

calculate the time interval in different year

1 Answers1