1

I'm new to python and I'm performing a basic EDA analysis on two similar SFrames. I have a dictionary as two of my columns and I'm trying to find out if the max values of each dictionary are the same or not. In the end I want to sum up the Value_Match column so that I can know how many values match but I'm getting a nasty error and I haven't been able to find the source. The weird thing is I have used the same methodology for both the SFrames and only one of them is giving me this error but not the other one.

I have tried calculating max_func in different ways as given here but the same error has persisted : getting-key-with-maximum-value-in-dictionary

I have checked for any possible NaN values in the column but didn't find any of them.

I have been stuck on this for a while and any help will be much appreciated. Thanks!

Code:

def max_func(d):
    v=list(d.values())
    k=list(d.keys())
    return k[v.index(max(v))]

sf['Max_Dic_1'] = sf['Dic1'].apply(max_func)
sf['Max_Dic_2'] = sf['Dic2'].apply(max_func)
sf['Value_Match'] = sf['Max_Dic_1'] == sf['Max_Dic_2']
sf['Value_Match'].sum()

Error :

RuntimeError                              Traceback (most recent call last)
<ipython-input-70-f406eb8286b3> in <module>()
----> 1 x = sf['Value_Match'].sum()
  2 y = sf.num_rows()
  3 
  4 print x
  5 print y

 C:\Users\rakesh\Anaconda2\lib\site-
 packages\graphlab\data_structures\sarray.pyc in sum(self)
 2216         """
 2217         with cython_context():
-> 2218             return self.__proxy__.sum()
 2219 
 2220     def mean(self):

C:\Users\rakesh\Anaconda2\lib\site-packages\graphlab\cython\context.pyc in 
__exit__(self, exc_type, exc_value, traceback)
 47             if not self.show_cython_trace:
 48                 # To hide cython trace, we re-raise from here
---> 49                 raise exc_type(exc_value)
 50             else:
 51                 # To show the full trace, we do nothing and let 
 exception propagate

RuntimeError: Runtime Exception. Exception in python callback function 
evaluation: 
ValueError('max() arg is an empty sequence',): 
Traceback (most recent call last):
File "graphlab\cython\cy_pylambda_workers.pyx", line 426, in 
graphlab.cython.cy_pylambda_workers._eval_lambda
File "graphlab\cython\cy_pylambda_workers.pyx", line 169, in 
graphlab.cython.cy_pylambda_workers.lambda_evaluator.eval_simple
File "<ipython-input-63-b4e3c0e28725>", line 4, in max_func
ValueError: max() arg is an empty sequence
Community
  • 1
  • 1
  • The error is not in the `sum` itself. The `max_func` is applied lazily and the problem is that for an emty dictionary, there is no `max`... – Willem Van Onsem May 09 '17 at 08:23

1 Answers1

0

In order to debug this problem, you have to look at the stack trace. On the last line we see:

File "<ipython-input-63-b4e3c0e28725>", line 4, in max_func
ValueError: max() arg is an empty sequence

Python thus says that you aim to calculate the maximum of a list with no elements. This is the case if the dictionary is empty. So in one of your dataframes there is probably an empty dictionary {}.

The question is what to do in case the dictionary is empty. You might decide to return a None into that case.

Nevertheless the code you write is too complicated. A simpler and more efficient algorithm would be:

def max_func(d):
    if d:
        return max(d,key=d.get)
    else:
        # or return something if there is no element in the dictionary
        return None
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555