1

I have a dictionary called ret_docs of retrieved documents (values) based on queries (keys) that looks like this:

{'q1': ['d51', 'd874', 'd486', 'd329', 'd114'],
 'q2': ['d51', 'd1147', 'd12', 'd100', 'd114'],
 'q3': ['d707', 'd144', 'd542', 'd329', 'd395'],
 'q4': ['d189', 'd575', 'd179', 'd1182', 'd160'],
 'q5': ['d730', 'd329', 'd1066', 'd14', 'd163'],
 'q6': ['d798', 'd97', 'd99', 'd927', 'd1195'],
 'q7': ['d189', 'd1347', 'd423', 'd1040', 'd174'],
 'q8': ['d234', 'd122', 'd189', 'd160', 'd197'],
 'q9': ['d270', 'd45', 'd97', 'd123', 'd193'],
 'q10': ['d493', 'd302', 'd1199', 'd949', 'd1214']

And a dictionary of relevance judgements for each document called reljudges:

{'q1': ['d184',
  'd29',
  'd31',
  'd12',
  'd51',
  'd102',
  'd13',
  'd14',
  'd15',
  'd57',
  'd378',
  'd859',
  'd185',
  'd30',
  'd37',
  'd52',
  'd142',
  'd195',
  'd875',
  'd56',
  'd66',
  'd95',
  'd462',
  'd497',
  'd858',
  'd876',
  'd879',
  'd880'],
 'q2': ['d12',
  'd15',
  'd184',
  'd858',
  'd51',
  'd102',
  'd202',
  'd14',
  'd52',
  'd380',
  'd746',
  'd859',
  'd948',
  'd285',
  'd390',
  'd391',
  'd442',
  'd497',
  'd643',
  'd856',
  'd857',
  'd877',
  'd864',
  'd658'],
 'q3': ['d5', 'd6', 'd90', 'd91', 'd119', 'd144', 'd181', 'd399'],
 'q4': ['d236', 'd166'],
 'q5': ['d552', 'd401', 'd1297', 'd1296'],
 'q6': ['d99', 'd115', 'd257', 'd258'],
 'q7': ['d20', 'd56', 'd57', 'd58', 'd19'],
 'q8': ['d48',
  'd122',
  'd20',
  'd58',
  'd196',
  'd354',
  'd360',
  'd197',
  'd999',
  'd1112',
  'd1005'],
 'q9': ['d21', 'd22', 'd550'],
 'q10': ['d259', 'd405', 'd302', 'd436', 'd437', 'd438', 'd998', 'd1011']
}

I have written up this code to calculate the "precision @ N":

if (n==-1) or  (n>len(list(ret_docs.values())[0])):
    ret_docs = ret_docs
else: 
    ret_docs = {key: value[:n] for key, value in ret_docs.items()}
        
rel_ret = {}
for key, val in ret_docs.items():
    rel_ret[key] = []
    for i in val:
        if i in reljudges[key]:
            rel_ret[key].append(i)
    
prec = {k: len(rel_ret[k])/len(ret_docs[k]) for k in ret_docs.keys() & reljudges}

Which outputs a dict that looks like this (dummy values):

{
    'q1'  : 0.1,
    'q2'  : 0.3,
    ...,
    'q10': 0.2
}

I need to use these same ret_docs and reljudges to calculate the average precision and mean average precision. My problem is that my understanding of average precision is that it would be the precisions of multiple tested n values average out, but how do I know what to set the n value to be at, or is this not even necessary? Average precision would be a dictionary of values just like the prec dictionary, except with the average precision for each query instead. I know that mean average precision would just be something like: mean_avg_pre = np.array(list(avg_pre.values())).mean()

Hefe
  • 421
  • 3
  • 23

1 Answers1

1

Looking at the formula for average precision https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Average_precision

enter image description here

You can see that the only parameter you have to set for the computation is k (n is the number of retrieved documents). This depends on your use case. K define the length of the list of documents you want to evaluate. If you want your system to perform well on the entire list of returned documents, you can choose k to be the total number of retrieved documents. Otherwise, if you are interested in the top 20 results you can put it to 20.

To make an example, if you are ranking products for an e-commerce website and for a mobile device, usually, it is the number of documents the user visualize. If 6 is the number of products the user see at first when searching, you can put k at 6. For a desktop mobile, this value can be higher (since users can see more products).

Seasers
  • 466
  • 2
  • 7