1

I am currently working on a project involving data of delivery timings. The data can be both negative (indicating that the delivery was not late but actually ahead of the estimate) or positive (indicating that it was indeed late).

I would like to obtain the five number summary and interquartile range using the fivenum() function on the data. However, because all of the values are positive, my statistics are not accurate. The following is an example of the data I am working with:

  Delivery.Late Reaction.Time Time.Until.Send.To.Vendor
1      00:01:29      00:00:00                  00:05:08
2      00:12:19      00:00:00                  00:04:52
3      00:02:55      00:00:00                  00:05:42
4      00:06:14      00:00:00                  00:14:34
5     -00:06:05      00:00:00                  00:01:42
6      00:09:58      00:00:00                  00:02:56

From this, I am interested in the Delivery.Late variable and would like to perform exploratory / diagnostic statistics on it.

I have used the chron package to convert the column data into chronological objects but chron(object) always takes the absolute value of the time and turns it into a positive value. Here is a sample of my code:

library(chron)
feb_01_07 <- read.csv("~/filepath/data.csv")
#converting factor to time
feb_01_07[,19] <- chron(times=feb_01_07$Delivery.Late)
#Five number summary and interquartile range for $Delivery.Late column
fivenum(feb_01_07$Delivery.Late, na.rm=TRUE)

After running fivenum() I get the results:

[1] 00:01:29 00:02:55 00:06:09 00:09:58 00:12:19

Which is inaccurate because the lowest number (the first term), should in fact, be -00:06:05 and not 00:01:29. -00:06:05 was converted to a positive chronological object and became the median instead.

How can I convert them to time objects while maintaining the negative values?Thanks so much for any insight!

DonkeyKhan
  • 11
  • 2
  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data are not very helpful. What exactly do you need when you say you want to "work with" negative time values. What are you trying to accomplish? – MrFlick Feb 21 '18 at 19:48
  • Thanks for your reply @MrFlick, I have included a sample code and a little bit more detail as to what I am desiring to do: I would like to obtain the five number summary and interquartile range using the fivenum() function on the data. However, because all of the values are positive, my statistics are not accurate. – DonkeyKhan Feb 21 '18 at 19:56
  • Please try to cut down your example to the essentials and make it so that anyone can just copy and paste it into their session. Note that `times(-0.5)` works. – G. Grothendieck Feb 21 '18 at 20:33
  • Thank you for your reply @G.Grothendieck, I have edited my post once again. As for your comment about times(-0.5), i will need it to be in (hh:mm:ss) format while still retaining the negative, which it doesn't seem to do. For instance, in my fivenum() output, the lowest number should be -00:06:05 and not 00:01:29. – DonkeyKhan Feb 21 '18 at 20:55
  • `-times("01:02:03")` gives a negative value. – G. Grothendieck Feb 21 '18 at 20:57
  • Incredible, thanks so much @G.Grothendieck! I guess I will have to create a for loop for each variable and apply the regular times() if it is positive, and -times() if it is negative. – DonkeyKhan Feb 21 '18 at 21:03
  • Have elaborated in answers. – G. Grothendieck Feb 21 '18 at 21:21

2 Answers2

1

1) chron times can represent negative times but will render them as negative numbers. We can present it as a negative times object like this:

library(chron)

# convert string in form [-]HH:MM:SS to times object
neg_times <- function(x) ifelse(grepl("-", x), - times(sub("-", "", x)), times(x))

DF <- read.table("data.dat")
test <- transform(DF, Delivery.Late = neg_times(Delivery.Late))

giving:

> test
  Delivery.Late Reaction.Time Time.Until.Send.To.Vendor
1   0.001030093      00:00:00                  00:05:08
2   0.008553241      00:00:00                  00:04:52
3   0.002025463      00:00:00                  00:05:42
4   0.004328704      00:00:00                  00:14:34
5  -0.004224537      00:00:00                  00:01:42
6   0.006921296      00:00:00                  00:02:56

and we could also define a formatting routine:

# format a possibly negative times object
format_neg_times <- function(x) {
  paste0(ifelse(x < 0, "-", ""), format(times(abs(x))))
}

format_neg_times(test[[1]])
## [1] "00:01:29"  "00:12:19"  "00:02:55"  "00:06:14"  "-00:06:05" "00:09:58" 

2) The example in the question only has times that are before noon. If it is always the case that the times are between -12:00:00 and 12:00:00 then we could represent negative times as x + 1 like this:

library(chron)

wrap_neg_times <- function(x)  times(neg_times(x) %% 1)

DF <- read.table("data.dat")
test2 <- transform(DF, Delivery.Late = wrap_neg_times(Delivery.Late))

giving:

> test2
  Delivery.Late Reaction.Time Time.Until.Send.To.Vendor
1      00:01:29      00:00:00                  00:05:08
2      00:12:19      00:00:00                  00:04:52
3      00:02:55      00:00:00                  00:05:42
4      00:06:14      00:00:00                  00:14:34
5      23:53:55      00:00:00                  00:01:42
6      00:09:58      00:00:00                  00:02:56

format_wrap_neg_times <- function(x) {
   format_neg_times(ifelse(x > 0.5, x - 1, x))
}
format_wrap_neg_times(test2[[1]])
## [1] "00:01:29"  "00:12:19"  "00:02:55"  "00:06:14"  "-00:06:05" "00:09:58" 

Note

The input in reproducible form:

Lines <- "
  Delivery.Late Reaction.Time Time.Until.Send.To.Vendor
1      00:01:29      00:00:00                  00:05:08
2      00:12:19      00:00:00                  00:04:52
3      00:02:55      00:00:00                  00:05:42
4      00:06:14      00:00:00                  00:14:34
5     -00:06:05      00:00:00                  00:01:42
6      00:09:58      00:00:00                  00:02:56"
cat(Lines, file = "data.dat")

Update

Fix.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

Can do something like this:

library(chron)

delivery_late <- c("00:01:29", "00:12:19", "-00:06:05")
not_late_idx <- grep(pattern = "^-.*", x = delivery_late)

times <- chron(times=delivery_late)
times[not_late_idx] <- -1*times[not_late_idx]
Aleh
  • 776
  • 7
  • 11