0

I was trying to refactor code that was previously very manual, and involved setting the index for each new data frame I created, to essentially create this desired output:

    f1          precision   recall
A   0.600315956 0.72243346  0.513513514
B   0.096692112 0.826086957 0.051351351
C   0.085642317 0.62962963  0.045945946
D   0.108641975 0.628571429 0.059459459

Here is my current code:

summaryDF = pd.DataFrame().set_index(['A','B','C','D'])

def evaluation(trueLabels, evalLabels):

    precision = precision_score(trueLabels, evalLabels)
    recall = precision_score(trueLabels, evalLabels)
    f1 = precision_score(trueLabels, evalLabels)
    accuracy = accuracy_score(trueLabels, evalLabels)

    data = {'precision': precision,
               'recall': recall,
               'f1': f1}

    DF = pd.DataFrame(data)

    summaryDF.concat(DF,ignore_index=True)


results = [y_randpred,y_cat_random_to_binary,y_cat_random_to_binary_threshold,y_closed_random_to_binary]

for result in results:
    evaluation(y_true_claim, result)

Here is my error trace:

Traceback (most recent call last):
  File "/Users/dhruv/Documents/bla/bla/src/main/bla.py", line 419, in <module>
    summaryDF = pd.DataFrame().set_index(['A','B','C','D'])
  File "/Users/dhruv/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 2607, in set_index
    level = frame[col].values
  File "/Users/dhruv/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1797, in __getitem__
    return self._getitem_column(key)
  File "/Users/dhruv/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1804, in _getitem_column
    return self._get_item_cache(key)
  File "/Users/dhruv/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache
    values = self._data.get(item)
  File "/Users/dhruv/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 2851, in get
    loc = self.items.get_loc(item)
  File "/Users/dhruv/anaconda/lib/python2.7/site-packages/pandas/core/index.py", line 1572, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
  File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)
  File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)
KeyError: 'A'

Any idea what I am doing wrong?

Dhruv Ghulati
  • 2,976
  • 3
  • 35
  • 51

1 Answers1

0

I solved my problem.

Using this answer, my code becomes:

summaryDF = pd.DataFrame(columns=('precision','recall','f1'))

def evaluation(trueLabels, evalLabels):

    global summaryDF

    precision = precision_score(trueLabels, evalLabels)
    recall = recall_score(trueLabels, evalLabels)
    f1 = f1_score(trueLabels, evalLabels)

    data = {'precision': [precision],
               'recall': [recall],
               'f1': [f1]
            }

    DF = pd.DataFrame(data)

    summaryDF = pd.concat([summaryDF,DF])

results = [y_randpred,
           y_cat_random_to_binary,
           y_cat_random_to_binary_threshold,
           y_closed_random_to_binary,
           y_closedCat_random_to_binary_threshold]

for result in results:
    evaluation(y_true_claim, result)

summaryDF.index=list(['A',
                     'B',
                     'C',
                     'D',
                     'E'])

Key aspects are that I need to place the elements in square brackets for precision, recall and F1, and also set the index afterwards via summaryDF.index instead of the set_index method.

So I only append and then set the index at the end instead of the beginning of my appending of the data frames, because any initiated data frame has to have an index at the beginning of some sort.

Community
  • 1
  • 1
Dhruv Ghulati
  • 2,976
  • 3
  • 35
  • 51