0

I cannot understand why I get incorrect differences (in days) between 2 dates. From the codes below, I think it is a formatting issue which I am unable to fix. The TZ is Europe/Berlin.

This is my data:

dat <- structure(list(date_until = structure(c(18657, 18634, 18653, 
18637, 18636, 18650, 18653, 18654, 18639, 18638, 18635, 18639, 
18652, 18655, 18647, 18642, 18659, 18640, 18663, 18636, 18643, 
18661, 18647, 18644, 18641, 18640, 18661, 18642, 18653, 18635, 
18642, 18639, 18670, 18644, 18635, 18645, 18650, 18640, 18635, 
18637), class = "Date"), date_from = structure(c(18635, 18624, 
18644, 18636, 18634, 18643, 18644, 18637, 18629, 18626, 18622, 
18637, 18645, 18646, 18642, 18641, 18645, 18636, 18651, 18634, 
18635, 18641, 18634, 18634, 18639, 18634, 18635, 18629, 18644, 
18624, 18635, 18620, 18617, 18634, 18623, 18597, 18639, 18626, 
18634, 18615), class = "Date")), row.names = c(8L, 45L, 48L, 
59L, 68L, 128L, 224L, 332L, 370L, 448L, 452L, 453L, 547L, 754L, 
891L, 962L, 993L, 1019L, 1047L, 1076L, 1101L, 1208L, 1227L, 1390L, 
1446L, 1544L, 1640L, 1687L, 1714L, 1788L, 1926L, 2031L, 2041L, 
2052L, 2059L, 2092L, 2096L, 2166L, 2206L, 2273L), class = "data.frame")

How the data looks like:

str(dat)
'data.frame':   40 obs. of  2 variables:
$ date_until: Date, format: "2021-01-30" "2021-01-07" "2021-01-26" "2021-01-10" ...
$ date_from : Date, format: "2021-01-08" "2020-12-28" "2021-01-17" "2021-01-09" ...

This is what I tried and what I get are incorrect values:

dat$diff_in_days = difftime(dat$date_until, dat$date_from, units = "days") # days
dat$diff_in_days <- sort(dat$diff_in_days) 
dat
date_until  date_from diff_in_days
2021-01-30 2021-01-08       1 days
2021-01-07 2020-12-28       1 days
2021-01-26 2021-01-17       1 days
2021-01-10 2021-01-09       2 days
2021-01-09 2021-01-07       2 days
2021-01-23 2021-01-16       2 days
2021-01-26 2021-01-17       2 days
2021-01-27 2021-01-10       4 days
2021-01-12 2021-01-02       5 days
2021-01-11 2020-12-30       6 days
2021-01-08 2020-12-26       7 days
2021-01-12 2021-01-10       7 days

I also tried:

dat <- dat %>%
        mutate(
        date_from = ymd(date_from),
        date_until = ymd(date_until),
        diff_in_days = date_until - date_from) %>%
        arrange(diff_in_days)
dat
   date_until  date_from diff_in_days
1  2021-01-10 2021-01-09       1 days
2  2021-01-15 2021-01-14       1 days
3  2021-01-08 2021-01-07       1 days
4  2021-01-09 2021-01-07       2 days
5  2021-01-12 2021-01-10       2 days
6  2021-01-09 2021-01-07       2 days
7  2021-01-14 2021-01-12       2 days
8  2021-01-13 2021-01-09       4 days
9  2021-01-20 2021-01-15       5 days
10 2021-01-13 2021-01-07       6 days
11 2021-01-23 2021-01-16       7 days
12 2021-01-25 2021-01-18       7 days
13 2021-01-15 2021-01-08       7 days
14 2021-01-16 2021-01-08       8 days
15 2021-01-26 2021-01-17       9 days
16 2021-01-26 2021-01-17       9 days
17 2021-01-28 2021-01-19       9 days
18 2021-01-26 2021-01-17       9 days
19 2021-01-07 2020-12-28      10 days
20 2021-01-12 2021-01-02      10 days

Also this is incorrect:

library(tidyverse)
library(lubridate) 
dat <- dat %>%
  mutate(diff_in_days = time_length(difftime(date_from, date_until), "days"))
dat
date_until  date_from diff_in_days
2021-01-30 2021-01-08          -22
2021-01-07 2020-12-28          -10
2021-01-26 2021-01-17           -9
2021-01-10 2021-01-09           -1
2021-01-09 2021-01-07           -2

1 Answers1

4

You should not sort but order :

dat$diff_in_days = difftime(dat$date_until, dat$date_from, units = "days") # days
dat <- dat[order(dat$diff_in_days), ]

Or with dplyr :

library(dplyr)
dat %>%
  mutate(diff_in_days = difftime(date_until, date_from, units = "days")) %>%
  arrange(diff_in_days)

#   date_until  date_from diff_in_days
#1  2021-01-10 2021-01-09       1 days
#2  2021-01-15 2021-01-14       1 days
#3  2021-01-08 2021-01-07       1 days
#4  2021-01-09 2021-01-07       2 days
#5  2021-01-12 2021-01-10       2 days
#6  2021-01-09 2021-01-07       2 days
#7  2021-01-14 2021-01-12       2 days
#8  2021-01-13 2021-01-09       4 days
#9  2021-01-20 2021-01-15       5 days
#10 2021-01-13 2021-01-07       6 days
#....
#....

This might be a good read to understand the difference between sort and order. Difference between sort(), rank(), and order()

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks. Could you please explain why? However, this does not solve the issue. Does it? – Lazline Brown Apr 27 '21 at 12:50
  • @LazlineBrown What is the issue after using the answer? – Ronak Shah Apr 27 '21 at 12:53
  • I guess I'm not good at counting, but `45 2021-01-07 2020-12-28 10 days` seems as it should be 11. Perhaps. Essentially all that cross the new year are a -day off . – Chris Apr 27 '21 at 13:00
  • 10 days is correct. Maybe it's easier to break it down: Dec 28 - Dec 31 = 3 days; Dec 31 - Jan 01 = 1 day; Jan 01 to Jan 07 = 6 days. 3 + 1 + 6 = 10. – jkpate Apr 27 '21 at 13:04
  • Thinking in days inclusive, apparently, which is not like a number line, but I'd like to breathe on all days, inclusive, hence 4 for Dec 28 - Dec 31. And otherwise the answer is completely correct. – Chris Apr 27 '21 at 13:13
  • How about the number of days between Dec 28 and Dec 28? :p – jkpate Apr 27 '21 at 13:17