Locating the cause of data-sourced errors within ddply

Question

I love using plyr, but sometimes the underlying data throws an error that I can't locate.

For instance, I've created a summing function that throws an error if x == 8:

df <- data.frame(x = rep(1:10,3), y = runif(30))

ddply(df,
      .(x),
      function (z) { 
        if(z$x[1] == 8) {
          stop("There's an error somewhere.")
        }
        return(sum(z$y))
        })

Pretending I didn't know what had caused the error, is there any way to report which rows of data caused the error?

Ah, that looks VERY promising @Jdbaba. I get the result: `8: stop("There's an error somewhere.") at #5`. How do I interpret what `#5` means? — canary_in_the_data_mine, Jul 12 '13 at 22:25
this isn't a very good example as your `df$y` is now all characters; regardless, just use `try` or `tryCatch` — eddi, Jul 12 '13 at 23:01
@eddi, Right, I just noticed that and changed it back. Is there a way to wrap the entire `function (z)` in `try` or `tryCatch`? Right now since the actual use-case is quite lengthy, I'd have to wrap every single operation within the function. (And to be honest, I still haven't found a way, playing with tryCatch, to get it to report the data that caused the problem. Maybe I've gone braindead.) — canary_in_the_data_mine, Jul 12 '13 at 23:10

score 4 · Accepted Answer · answered Jul 12 '13 at 23:18

Here's an example of using tryCatch:

set.seed(1)
df <- data.frame(x = rep(1:10,3), y = runif(30))

f = function (z) { 
        if(z$x[1] == 8) {
          stop("There's an error somewhere.")
        }
        return(sum(z$y))
    }

ddply(df, .(x), function(z) {
         tryCatch(f(z), error = function(e) {
              print("offending block is"); print(z)
         })
     })

#[1] "offending block is"
#  x         y
#1 8 0.6607978
#2 8 0.9919061
#3 8 0.3823880
#Error in list_to_dataframe(res, attr(.data, "split_labels")) : 
#  Results must be all atomic, or all data frames

Perfect; thanks. I had a number of small errors I'd made in using tryCatch, and this example cleared them all up. — canary_in_the_data_mine, Jul 13 '13 at 02:06

score 0 · Answer 2 · answered Jul 12 '13 at 22:17

0

When you use ddply you have the whole subset of the data.frame df at your disposal inside your function as your variable z. So, you can do anything you want from in there.

z[z$x==8,]

For example, will give you the offending rows. If you want to know which subset is throwing the error, you can do something like:

if (z$x[1] ==8) {
  stop(paste('There is an error in the', unique(z$x), 'subset of df'))
}

Otherwise, you'll have to explain more clearly what you're looking for. Including a working example set of data that errors and an example of what information about what you want to know will go a long way... as it stands now, I'm just guessing!

answered Jul 12 '13 at 22:17

Justin

42,475
9
93
111

Ah, pretend you can't touch anything in the code that throws the error (the `stop` function), since in my actual use case, I'm not the one throwing the error. I'd love to have a trigger that says, 'If an error has been thrown, `print(unique(z$x))`, but I can't get that to work at the moment. For instance, `if (length(warnings()) > 0) { print(first(z$x))}` doesn't help because the function won't pass the error and get to that point of printing the value. – canary_in_the_data_mine Jul 12 '13 at 22:22
@canary_in_the_data_mine you can just print(...) what i wrote... or take a read of [this post and its links](http://stackoverflow.com/questions/2622777/exception-handling-in-r) for a bit of info about error handling in `R`. – Justin Jul 12 '13 at 22:24
Not sure I know what you mean about just `print(...)` what you wrote, @Justin. Does the updated example make more sense: since I'm not the one controlling the error any more, I'm not sure how to find the data that caused the error. – canary_in_the_data_mine Jul 12 '13 at 22:30
1

Since you still haven't provided any information about what you're doing, I can't help, but you're looking for `tryCatch`. More specifically, something like `tryCatch(sum(x$y), error=function(e) e)` will catch the error without stopping. – Justin Jul 12 '13 at 22:44
Hmmm. What I'm actually doing is an extremely long function that just started throwing errors with new data, so it's not very reproducible. Is there a way I can get tryCatch to return the data that caused the error? I'm less interested in suppressing the error then finding the data that caused it. Thanks for all your help so far though, @Justin. – canary_in_the_data_mine Jul 12 '13 at 22:58

Locating the cause of data-sourced errors within ddply

2 Answers2