I am aware this have been partly answered here.
Anyway I am not sure I'm achieving what I want. I'll briefly explain what I am doing:
- Reading through a huge list of json files which have a particularly nested structure.
- Extracting the lowest level value from them and averaging when these values are lists.
- Collecting these values into numpy arrays.
- Dumping my numpy arrays into pickled files.
Everything goes quite smooth, but I receive some numpy runtime warning:
RuntimeWarning: Mean of empty slice.
RuntimeWarning: invalid value encountered in double_scalars
Function which gives me trouble is the one who actually extract the values which performs this operation: v = np.mean(v)
I know the error can be caused by a list of zeros or by some NaN/Inf or whatever is into the value.
I would like to get rid of them just by throwing away the current .json sample from my data set.
So I've set: np.seterr(all='warn')
And I made this awkward code to try to catch it:
def ExtracValues(d):
for v in d.values():
if isinstance(v, dict):
yield from ExtracValues(v)
else:
if isinstance(v,list):
# v = np.mean(v) #just averaging vectorial values of features. #it may be here that raises the empty slice warning of numpy.
try:
v = np.mean(v)
except Warning:
return #trying to trash samples which are no behaving good
yield v
My problem is that I don't know if it's effectively working, because the warning are still printed on stdout. I suppose the code should have stopped after setting all="Warning"
, but how could I easily check if I am right?
Also, is there a more pythonic way to shorten that function. I really don't like the try/except
nested in that way.