1

I have a data.table of 600,000 rows and execute the following command on it:

ranges <- mapply(function(mi, ma) {seq(from=mi, to=ma, by="days")}, mi=Moves$Start, ma=Moves$End)

I get the following error message after a while:

Error in seq.int(0, to0 - from, by) : wrong sign in 'by' argument

I have tested my code with a smaller dataset and that seems to be working fine. This leads me to think that the error message is the result of the values in the dataset. Can anybody recommend an efficient way to trace the problem row(s) in the data.table? Needless to say, manually checking 600k rows is a bit too much.

Your suggestions for finding the problem rows in the data.table are appreciated!

Jochem
  • 3,295
  • 4
  • 30
  • 55
  • 1
    you can start with replacing `seq(from=mi, to=ma, by="days")` with `cat(mi,ma,"\n")` to see when does it fail – CHP Mar 21 '13 at 17:47
  • also are you sure it's `by="days"` and not `by=days` where days is a variable? – CHP Mar 21 '13 at 17:48
  • @geektrader I'm pretty sure the `by` argument in `seq.Date` accepts "days" – joran Mar 21 '13 at 17:53
  • @joran but error says `seq.int` – CHP Mar 21 '13 at 17:54
  • I save the warnings and errors from each step separately and look through them afterwards. See http://stackoverflow.com/q/4948361/210673 – Aaron left Stack Overflow Mar 21 '13 at 17:54
  • @geektrader Good point, yes. – joran Mar 21 '13 at 18:03
  • @geektrader: `by="days"` is correct and `by=days` results in an instant error – Jochem Mar 21 '13 at 18:37
  • @geektrader but that is an internal function. I can reproduce the error when the `from` is after the `to`; you can't go back in time by using positive time steps, only negative ones. See my answer for a reproducible call that results in the error Jochem is seeing. – Gavin Simpson Mar 21 '13 at 19:31

2 Answers2

3

The obvious solution is to turn the anonymous function into a first class, fully named function, and then you can debug the function. Or turn on the recover option and then you can step into the evaluation frames for the current stack and see the state of the variables at the point the error was raised.

myFun <- function(mi, ma) {
  seq(from=mi, to=ma, by="days")
}

gets you a named function, which you can debug via

debug(myFun)

or

debugonce(myFun)

To turn on error recovery do

op <- options(error = recover)

(you can rest that then with: options(op) or options(error = stop)

In this case I suspect that mi is greater than ma:

> myFun(Sys.Date(), Sys.Date()-1)
Error in seq.int(0, to0 - from, by) : wrong sign in 'by' argument

so you could alter myFun to see if that is the case:

myFun <- function(mi, ma) {
  if(mi > ma)
    stop("`mi` is > than `ma`")
  seq(from=mi, to=ma, by="days")
}

That way you get a more informative error message.

If that fails I'd use options(error = recover) and then drop into the evaluation call corresponding to the function and see what the values of mi and ma are.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
1

Overview

seq.Date()'s error message is trying to tell you that a date in Moves$End (i.e. June 23, 2017) occurs before Moves$Start (i.e. April 17, 2017). Because seq.Date() assumes all dates in from occur before the dates in to, the error stops the function from proceeding.

To identify where this occurs, use which() to identify which dates in Moves$End are less than Moves$Start. From there, update those dates so that they occur after Moves$Start.

# load necessary data
Moves <- data.frame( Start = as.Date( x = c("2017-04-17", "2018-03-01", "2019-04-01") )
                     , End = as.Date( x = c("2017-06-23", "2018-02-14", "2018-04-24") )
                     , stringsAsFactors = FALSE )

# try to create a sequence of dates
date.ranges <-
  mapply( FUN = function( mi, ma )
  seq.Date( from = mi
            , to = ma
            , by = "day" )
  , Moves$Start
  , Moves$End
  , SIMPLIFY = FALSE )

# identify the instance
# where the End date occurs
# before the Start date
wrong.end.date <-
  which( Moves$End < Moves$Start )

# view results
wrong.end.date
# [1] 2 3

# correct those End Dates
# so that they occur 
# after the Start date
Moves$End[ wrong.end.date ] <-
  as.Date( x = c("2019-02-14", "2019-04-24") )

# rerun the mapply() function
date.ranges <-
  mapply( FUN = function( mi, ma )
    seq.Date( from = mi
              , to = ma
              , by = "day" )
    , Moves$Start
    , Moves$End
    , SIMPLIFY = FALSE )

# end of script #
Cristian E. Nuno
  • 2,822
  • 2
  • 19
  • 33