3

Good day. I have been searching through related posts without getting the ideal solution I would like to find. Let me describe my problem:

I am analyzing texts from a corpus, and extracting features from those texts, then storing the features in an array. Some of these features involve ratios, for example the ratio of masculine pronoun "he" to femenine pronoun "she". The thing is, for some of the variables, the value will be zero, and they will raise ZeroDivisionError.

Since I calculate about 100 of these ratios, wrapping a try / catch exception around every ratio calculation sounds like too cumbersome.

I found out that I can do

#16,RATIO_masculine_femenine
feature_map.append(numOfHe / numOfShe if numOfShe else 0)

But it is still a bit too much laborious. I was wondering if there is a way to state at the beggining of the script that any ZeroDivisionError should be substituted by NaN or 0, or whatever other value that may suit.

Thanks

gaurwraith
  • 143
  • 8
  • Have you had a look at the top answer [here](http://stackoverflow.com/questions/10011707/how-to-get-nan-when-i-divide-by-zero) ? –  Jan 19 '14 at 15:14

2 Answers2

0

The pythonic answer is to wrap it in a function, e.g.:

def ratio(a, b):
    if b == 0:
        return 0
    else:
        return a / b

feature_map.append(ratio(numOfHe, numOfShe))

The exact form of the function depends on the rest of you code, but if you're writing a line like that hundreds of times, then you should probably be wrapping it in a function, or at least using a loop. Also, variable names like numOfHe and numOfShe hint that you might be better served by a dict.

UPDATE

I see from your code link that each calc is actually totally different, so you probably can't easily loop it. Since the calcs are still relatively simple, you could try a trick with eval like this:

calcs = [
    ...
    (12, 'h + ha + hw + hy'),
    (13, '(h + ha) / (hw + hy)'),
    ...
]

for index, calc in calcs:
    try:
        v = eval(calc, locals())
    except ZeroDivisionError:
        v = 0
    feature_map.append(v)

You could also add other info to calcs, and use a namedtuple instead. You could also do use classes instead to evaluate the calcs dynamically as you need them, if that helps.

aquavitae
  • 17,414
  • 11
  • 63
  • 106
  • 1
    If your data structure is more-or-less global, you can reference the dictionary from your `ratio` function and just provide the keys as arguments, like `ratio('he', 'she')`. – Spen-ZAR Jan 19 '14 at 15:38
  • Thanks, looks like I am going to create a "ratio" function indeed, but I feel so lazy... " / " is so convenient :) I take it there is no way to straight away disable / substitute built in errors then ? As for the variables, I think something else could be better, (some kind of numpy array?) but not a dict, I plan on accessing feature_map by index number, and also having a logical index that tells which of those features are active to a learning algorithm – gaurwraith Jan 19 '14 at 15:44
  • Actually I have a collections.Counter for the list of words, and I calculate the ratio as ratio = Counter[he] / Counter[she]. The feature _map is to store the actual result. But yes I think this could be implemented a lot better, problem is I have no time :( – gaurwraith Jan 19 '14 at 15:49
  • Better to say there's no *sane* way of substituting errors globally. You might manage something with a clever context handler and doing frame magic but I really would not recommend it. The best approach is to code in such a way that you don't repeat code 100 times! And does it really take that long to write a 5 line function? There are many structures available for storing data. The best one depends on your usage. – aquavitae Jan 19 '14 at 15:52
  • The ratio function is a pretty good solution, thank you very much, now, regarding writing 200 times list.append(ratio(x,y)), I don't see how I could do this other way. I have 200+ calculations to do on each text, which implies different ratios, freqs of words and summations, etc. With those I have to do a data vector (the feature_map), but they need to be in order so that next text has the same features with its own values.I am doing it by calculating each measure in order, then appending it for each calculation. Maybe I could use a dictionary of functions? but how would that keep the order? – gaurwraith Jan 19 '14 at 16:12
  • Its difficult to say without seeing more of your code. If you can show a enough to give an idea of the problem I might be able to update my answer with more info. – aquavitae Jan 19 '14 at 16:20
0

If you wrap your int object in a custom subclass, you can address it once:

class SafeInt(int):
    def __div__(self, y):
        try:
            return SafeInt(super(SafeInt, self).__div__(y))
        except ZeroDivisionError:
            return SafeInt(0)

Overriding all ints:

original_int = int
int = SafeInt
int(5) / 0
# O: 0

Overriding some ints:

SafeInt(5) / 0
# O: 0

You have to be careful though about keeping the object a SafeInt. You'll notice everything I return inside __div__ is wrapped in SafeInt(). int objects being immutable, you have to explicitly return a new SafeInt object every time. Which means you probably need to make a decorator that each function in SafeInt() to ensure that. I leave that as an exercise to the reader!

Otherwise you'll end up with this:

>>> SafeInt(5) / 0
0   # this is a SafeInt object
>>> _ / 0
0   # this is a SafeInt object; no error
>>> SafeInt(5) + 0
5   # this is a basic int object
>>> _ / 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero

One final note: you can pass SafeInt as the argument to defaultdict to make all members SafeInt!


Edit: Knowing you wanted it to happen to all ints, I hoped something like this might work, but it's disallowed (for good reason):

>>> def wrapdiv(olddiv):
...     def newdiv(self, y):
...         try:
...             olddiv(self, y)
...         except ZeroDivisionError:
...             return 0
...     return newdiv
...
>>> int.__div__ = wrapdiv(int.__div__)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't set attributes of built-in/extension type 'int'
mhlester
  • 22,781
  • 10
  • 52
  • 75
  • Thanks for the elaborate answer. Being in somewhat of a hurry (must present some results as of tuesday) I won't implement it now, but will definitely look into it later, thanks! – gaurwraith Jan 19 '14 at 18:57