1

I love using plyr, but sometimes the underlying data throws an error that I can't locate.

For instance, I've created a summing function that throws an error if x == 8:

df <- data.frame(x = rep(1:10,3), y = runif(30))

ddply(df,
      .(x),
      function (z) { 
        if(z$x[1] == 8) {
          stop("There's an error somewhere.")
        }
        return(sum(z$y))
        })

Pretending I didn't know what had caused the error, is there any way to report which rows of data caused the error?

canary_in_the_data_mine
  • 2,193
  • 2
  • 24
  • 28
  • Have you tried using traceback() ? – Jd Baba Jul 12 '13 at 22:17
  • Ah, that looks VERY promising @Jdbaba. I get the result: `8: stop("There's an error somewhere.") at #5`. How do I interpret what `#5` means? – canary_in_the_data_mine Jul 12 '13 at 22:25
  • this isn't a very good example as your `df$y` is now all characters; regardless, just use `try` or `tryCatch` – eddi Jul 12 '13 at 23:01
  • @eddi, Right, I just noticed that and changed it back. Is there a way to wrap the entire `function (z)` in `try` or `tryCatch`? Right now since the actual use-case is quite lengthy, I'd have to wrap every single operation within the function. (And to be honest, I still haven't found a way, playing with tryCatch, to get it to report the data that caused the problem. Maybe I've gone braindead.) – canary_in_the_data_mine Jul 12 '13 at 23:10

2 Answers2

4

Here's an example of using tryCatch:

set.seed(1)
df <- data.frame(x = rep(1:10,3), y = runif(30))

f = function (z) { 
        if(z$x[1] == 8) {
          stop("There's an error somewhere.")
        }
        return(sum(z$y))
    }

ddply(df, .(x), function(z) {
         tryCatch(f(z), error = function(e) {
              print("offending block is"); print(z)
         })
     })

#[1] "offending block is"
#  x         y
#1 8 0.6607978
#2 8 0.9919061
#3 8 0.3823880
#Error in list_to_dataframe(res, attr(.data, "split_labels")) : 
#  Results must be all atomic, or all data frames
eddi
  • 49,088
  • 6
  • 104
  • 155
0

When you use ddply you have the whole subset of the data.frame df at your disposal inside your function as your variable z. So, you can do anything you want from in there.

z[z$x==8,]

For example, will give you the offending rows. If you want to know which subset is throwing the error, you can do something like:

if (z$x[1] ==8) {
  stop(paste('There is an error in the', unique(z$x), 'subset of df'))
}

Otherwise, you'll have to explain more clearly what you're looking for. Including a working example set of data that errors and an example of what information about what you want to know will go a long way... as it stands now, I'm just guessing!

Justin
  • 42,475
  • 9
  • 93
  • 111
  • Ah, pretend you can't touch anything in the code that throws the error (the `stop` function), since in my actual use case, I'm not the one throwing the error. I'd love to have a trigger that says, 'If an error has been thrown, `print(unique(z$x))`, but I can't get that to work at the moment. For instance, `if (length(warnings()) > 0) { print(first(z$x))}` doesn't help because the function won't pass the error and get to that point of printing the value. – canary_in_the_data_mine Jul 12 '13 at 22:22
  • @canary_in_the_data_mine you can just print(...) what i wrote... or take a read of [this post and its links](http://stackoverflow.com/questions/2622777/exception-handling-in-r) for a bit of info about error handling in `R`. – Justin Jul 12 '13 at 22:24
  • Not sure I know what you mean about just `print(...)` what you wrote, @Justin. Does the updated example make more sense: since I'm not the one controlling the error any more, I'm not sure how to find the data that caused the error. – canary_in_the_data_mine Jul 12 '13 at 22:30
  • 1
    Since you still haven't provided any information about what you're doing, I can't help, but you're looking for `tryCatch`. More specifically, something like `tryCatch(sum(x$y), error=function(e) e)` will catch the error without stopping. – Justin Jul 12 '13 at 22:44
  • Hmmm. What I'm actually doing is an extremely long function that just started throwing errors with new data, so it's not very reproducible. Is there a way I can get tryCatch to return the data that caused the error? I'm less interested in suppressing the error then finding the data that caused it. Thanks for all your help so far though, @Justin. – canary_in_the_data_mine Jul 12 '13 at 22:58