Python: what is the fastest way to map or compress calls and ignore errors?

Question

I frequently encounter a problem where I need to apply a function to a large iterator of data, but that function sometimes raises a known error that I want to ignore. Unfortunately, neither list compressions nor the map function has a good way to handle errors.

What is the best way to skip/deal with errors quickly in python?

For example, say I have a list of data and a function, the function raises a ValueError whenever the data is a str. I want it to skip these values. One way to do this would be:

result = []
for n in data:
    try: result.append(function(n))
    except ValueError: pass

You could also do the same thing without the error checking like:

result = [function(n) for n in data]

or

result = list(map(function, data))

I want an c-compiled approach to accomplishing the above. Something in the spirit of

result = list(map(function, data, skip_errors=True))

The feature of default=value would also be useful, so that raised errors create a default value.

I'm thinking this might be something I need to write a Cython extension for.

Note: one solution would be for me to write the catch function I wrote in this answer in c or cython. Then I could use it in list compressions and get the performance boost I want.

It's not a duplicate: the accepted answer is that you skip getting any data at all, and another comment suggests that you handle it within the function itself! (not c speed, unless your function is written in c/cython) — vitiral, Mar 02 '15 at 18:04
the "function" is currently a function that my user creates through a text box that is compiled into python and called repeatedly. So it can literally be anything. — vitiral, Mar 02 '15 at 18:14
the function is currently not cythonized, but it will hopefully be in the future through something like [numba](http://numba.pydata.org/) — vitiral, Mar 02 '15 at 19:48
Oh my god, I've been reading and saying that wrong for years! — vitiral, Mar 03 '15 at 03:23

score 0 · Answer 1 · answered Mar 02 '15 at 18:12

0

Why not just wrap your function in an error handler?

def spam(n):
    try:
        return function(n)
    except ValueError:
        pass
result = [spam(n) for n in data]

you can then add anything you want to the error handling (note that in this version it returns None, so you probably want to either filter the resulting list or return a default value). The same goes for using map.

answered Mar 02 '15 at 18:12

Evpok

4,273
3
34
46

not fast enough. This requires python `try: except:`, I want c-speed as much as possible. – vitiral Mar 02 '15 at 18:15
I could just use my answer to this question: http://stackoverflow.com/questions/16610997/python-exception-handling-in-list-comprehension?nah=1#28816652 But also not fast enough! – vitiral Mar 02 '15 at 18:15
1

@GarrettLinux How do you propose to handle the Python exception raised by `function` without using Python exception mechanism? And are you sure that this is a bottleneck? – Evpok Mar 02 '15 at 18:16
see the Note I added. I am proposing I just write a cython extension to handle this, I just want to make sure nothing like it exists. – vitiral Mar 02 '15 at 18:17
essentially I just wish that the map function had this built in – vitiral Mar 02 '15 at 18:18
@GarrettLinux I am not sure that cython would help, the function would still be raising an exception, so you'd still lose speed (again, if the issue is really there). – Evpok Mar 02 '15 at 18:24
1

If you want C speed, why are you writing this in Python? – kindall Mar 02 '15 at 18:36
@Evpok, Your error handling will end up adding None's to the list when exception's are raised. – Padraic Cunningham Mar 02 '15 at 18:52
@PadraicCunningham indeed it will. I mentioned it. – Evpok Mar 02 '15 at 18:52
That will add overhead removing though so not exactly an improvement. I think the correct solution would be to cythinise the function itself. – Padraic Cunningham Mar 02 '15 at 18:54
@PadraicCunningham Not much overhead if you use a generator `result = [c for c in (spam(n) for n in data) if c is not None]`. Testing Noneity is not that expensive. Anyway, he said he wanted to use a default value. – Evpok Mar 02 '15 at 19:45
Ideally I want this to be useable with functions that are already in C as well. Obviously cythonizing the function itself will speed things up (and eventually we hope to compile the python code users put in into C) but adding very fast error handling would be nice. Think about it: if you already had a cython function and just wanted to do fast error handling, wouldn't you want something like this? – vitiral Mar 02 '15 at 19:46
@kindall I want as much speed as python can give me, and we need to flexibility and dynamic attributes of python. – vitiral Mar 02 '15 at 19:50

Python: what is the fastest way to map or compress calls and ignore errors?

1 Answers1

Linked