16

Is there any body who has used TREC_EVAL? I need a "Trec_EVAL for dummies".

I'm trying to evaluate a few search engines to compare parameters like Recall-Precision, ranking quality, etc for my thesis work. I can not find how to use TREC_EVAL to send queries to the search engine and get a result file which can be used with TREC_EVAL.

mbx
  • 6,292
  • 6
  • 58
  • 91
Babak
  • 161
  • 1
  • 1
  • 4

1 Answers1

22

Basically, for trec_eval you need a (human generated) ground truth. That has to be in a special format:

query-number 0 document-id relevance

Given a collection like 101Categories (wikipedia entry) that would be something like

Q1046   0   PNGImages/dolphin/image_0041.png    0
Q1046   0   PNGImages/airplanes/image_0671.png  128
Q1046   0   PNGImages/crab/image_0048.png   0

The query-number identifies therefore a query (e.g. a picture from a certain category to find similiar ones). The results from your search engine has then to be transformed to look like

query-number    Q0  document-id rank    score   Exp

or in reality

Q1046   0   PNGImages/airplanes/image_0671.png  1   1   srfiletop10
Q1046   0   PNGImages/airplanes/image_0489.png  2   0.974935    srfiletop10
Q1046   0   PNGImages/airplanes/image_0686.png  3   0.974023    srfiletop10

as described here. You might have to adjust the path names for the "document-id". Then you can calculate the standard metrics trec_eval groundtrouth.qrel results. trec_eval --help should give you some ideas to choose the right parameters for using the measurements needed for your thesis.

trec_eval does not send any queries, you have to prepare them yourself. trec_eval does only the analysis given a ground trouth and your results.

Some basic information can be found here and here.

mbx
  • 6,292
  • 6
  • 58
  • 91
  • Hi @mbx, How did you calculate the numbers under the 'score' column above? (it says: 1, 0.974935, 0.974023). I've read that they represent the degrees between the row's result doc and the correct relevant doc, but I can't find how one would arrive at those numbers (except for '1'- which I assume indicates 100% accuracy). – Noon Time Mar 27 '17 at 18:33
  • @NoonTime iirc the first number is the position in the output (of topX) and the second is the ranking of the answer "how close does this output get if your input is 1" - so it completely depends on the algorithm you want to measure. – mbx Mar 28 '17 at 07:15
  • ok thanks @mbx, but mathematically, how did you get that 0.974935 number? I know it's derived from the {last_position - 1}, are you dividing that by the total number of retrieved results and using that fraction? Like if you had 100 results, so the second row's (second result's) score would be (100-1)/100 so .99 ? – Noon Time Mar 30 '17 at 05:02
  • @NoonTime to have an exact answer I'd have to recover my gitosis to look into my scripts for generating the trac_eval input. But it should depend on the data and its rating according to your metric. Consider color values in RGB. If your db contains black 000 red F00 yellow FF0 green 0F0 and white FFF and you value each color channel the same (you shouldn't, but for simplicity) searching for nearest 4 matches of white FFF should give you whiteFFF 1 1, yellowFF0 2 0.66, redF00 3 0.33, green 4 0.33. Your algo even could swap green and red as they'd have the same distance in this metric. – mbx Mar 30 '17 at 05:21
  • @mbx `10.2452/551-AH Q0 H-810631-57604S3 1 543.528 Exp` this is what `RetEval` command generates for me. (it's one of 1000 line of output file). when I run `trec_eval` to comparing it's give me an error with this message: `Segmentation fault (core dumped)`. what can I do now to fix this problem? – Saeed Zhiany Oct 27 '17 at 08:40
  • @SaeedZhiany A segfault indicates a bug in trec_eval, it should work without core dumping on any given input, even if wrong/unexpected. You should have seen a parse error instead. – mbx Oct 27 '17 at 09:36
  • @mbx I `make` and `make install` trec_eval with ubuntu 17.10 and gcc 7.2.0. is it ok? – Saeed Zhiany Oct 27 '17 at 09:40