4

I am aware this have been partly answered here.

Anyway I am not sure I'm achieving what I want. I'll briefly explain what I am doing:

  • Reading through a huge list of json files which have a particularly nested structure.
  • Extracting the lowest level value from them and averaging when these values are lists.
  • Collecting these values into numpy arrays.
  • Dumping my numpy arrays into pickled files.

Everything goes quite smooth, but I receive some numpy runtime warning:

  • RuntimeWarning: Mean of empty slice.
  • RuntimeWarning: invalid value encountered in double_scalars

Function which gives me trouble is the one who actually extract the values which performs this operation: v = np.mean(v)

I know the error can be caused by a list of zeros or by some NaN/Inf or whatever is into the value.

I would like to get rid of them just by throwing away the current .json sample from my data set.

So I've set: np.seterr(all='warn')

And I made this awkward code to try to catch it:

def ExtracValues(d):
    for v in d.values():
        if isinstance(v, dict):
            yield from ExtracValues(v)
        else:
            if isinstance(v,list):
                # v = np.mean(v)            #just averaging vectorial values of features.   #it may be here that raises the empty slice warning of numpy.
                try:
                    v = np.mean(v)
                except Warning:
                    return #trying to trash samples which are no behaving good                  
            yield v

My problem is that I don't know if it's effectively working, because the warning are still printed on stdout. I suppose the code should have stopped after setting all="Warning", but how could I easily check if I am right?

Also, is there a more pythonic way to shorten that function. I really don't like the try/except nested in that way.

sparaflAsh
  • 646
  • 1
  • 9
  • 26
  • By numpy.seterr(all = 'warn') you tell the system to warn you everytime, which is exactly the opposite of what you want. However, the warning seems not catched by np.seterr, as even `np.seterr(all = 'ignore')` does not get rid of the problem – Jürg W. Spaak Aug 11 '17 at 09:52
  • I'm pretty sure I should use [warning](https://docs.python.org/2/library/warnings.html) module, but still I fail to see how and where to fix it. – sparaflAsh Aug 11 '17 at 10:14

2 Answers2

8

Thanks to Jürg Merlin Spaak for his comment, I found a better and simpler solution. It's obviously better to catch the exception outside the function which I reverted back to the original version:

def ExtractValues(d):
    for v in d.values():
        if isinstance(v, dict):
            yield from ExtractValues(v)
        else:
            if isinstance(v,list):
                v = np.mean(v)  
            yield v

I've set everything on warn in the main piece of the of the code:

np.seterr(all='warn')

Then catch them:

with warnings.catch_warnings():
                    warnings.filterwarnings('error')
                    try:
                        raw_features = list(ExtractValues(data)) 
                    except Warning as e:
                        print('Houston, we have a warning:', e)
                        print('The bad guy is: ' + current_file)
                        print('This sample will not be considered.')
                        pass
                    else:
                        #Whatever

Worth noting for whoever comes here for the same exception. I succeeded to catch both warnings, but print(e) will tell you only "mean of empty slice". I can guess why, but I'm too tired to further investigate.

sparaflAsh
  • 646
  • 1
  • 9
  • 26
4

The warnings module is indeed what you need:

import warnings
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    np.mean([])

This code will not give any runtime warning, I guess you can adapt this code for what you need. If not tell me.

Jürg W. Spaak
  • 2,057
  • 1
  • 15
  • 34