7

I study papers on link prediction in knowledge networks. Authors usually report "Hits@k". I wonder how to calculate hits@k and what does it mean about the model and the results?

SilentFlame
  • 487
  • 5
  • 15

1 Answers1

13

In a nutshell, it is the count of how many positive triples are ranked in the top-n positions against a bunch of synthetic negatives.

In the following example, pretend the test set includes two ground truth positive only:

Jack   born_in   Italy
Jack   friend_with   Thomas

Let's assume such positive triples (identified by * below) are ranked against four synthetic negatives each.

Now, assign a score to each of the positives and its synthetic negatives using your pre-trained embedding model. Then, sort the triples in descending order. In the example below, the first triple ranks 2nd, and the other triple ranks first (against their respective synthetic negatives):

s        p         o            score   rank
Jack   born_in   Ireland        0.789      1
Jack   born_in   Italy          0.753      2  *
Jack   born_in   Germany        0.695      3
Jack   born_in   China          0.456      4
Jack   born_in   Thomas         0.234      5

s        p         o            score   rank
Jack   friend_with   Thomas     0.901      1  *
Jack   friend_with   China      0.345      2
Jack   friend_with   Italy      0.293      3
Jack   friend_with   Ireland    0.201      4
Jack   friend_with   Germany    0.156      5

Then, count how many positives occur in the top-1 or top-3 positions, and divide by the number of triples in the test set (which in this example includes 2 triples):

Hits@3= 2/2 = 1.0
Hits@1= 1/2 = 0.5

AmpliGraph has an API to compute Hits@n - check out the documentation here.

lukostaz
  • 176
  • 1
  • 6