Questions tagged [ranking]

Ranking is the sorted order of an element in a list of elements. Usually a high ranking means the element is good in terms of certain metric.

is used in:

  • sorting algorithms
  • information retrieval to represent the relevance of documents with respective to the search queries.
  • recommendation to rank top-k items.
1634 questions
520
votes
14 answers

Using LIMIT within GROUP BY to get N results per group?

The following query: SELECT year, id, rate FROM h WHERE year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2) GROUP BY id, year ORDER BY id, rate DESC yields: year id rate 2006 p01 8 2003 p01 7.4 2008 p01 6.8 2001 p01…
Wells
  • 10,415
  • 14
  • 55
  • 85
161
votes
25 answers

A better similarity ranking algorithm for variable length strings

I'm looking for a string similarity algorithm that yields better results on variable length strings than the ones that are usually suggested (levenshtein distance, soundex, etc). For example, Given string A: "Robert", Then string B: "Amy…
marzagao
  • 3,756
  • 4
  • 19
  • 14
105
votes
9 answers

How do I find the closest values in a Pandas series to an input number?

I have seen: how do I find the closest value to a given number in an array? How do I find the closest array element to an arbitrary (non-member) number?. These relate to vanilla python and not pandas. If I have the series: ix num 0 1 1 …
Steve
  • 2,764
  • 4
  • 27
  • 32
64
votes
2 answers

Ranking order per group in Pandas

Consider a dataframe with three columns: group_ID, item_ID and value. Say we have 10 itemIDs total. I need to rank each item_ID (1 to 10) within each group_ID based on value , and then see the mean rank (and other stats) across groups (e.g. the IDs…
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
63
votes
5 answers

Find the n most common values in a vector

I have a vector say c(1,1,1,1,1,1,2,3,4,5,7,7,5,7,7,7) How do I count each element, and then return the e.g. 3 most common elements, i.e. 1, 7, 5?
ChairmanMeow
  • 843
  • 3
  • 10
  • 12
49
votes
13 answers

Efficient method to calculate the rank vector of a list in Python

I'm looking for an efficient way to calculate the rank vector of a list in Python, similar to R's rank function. In a simple list with no ties between the elements, element i of the rank vector of a list l should be x if and only if l[i] is the x-th…
Tamás
  • 47,239
  • 12
  • 105
  • 124
37
votes
6 answers

What string similarity algorithms are there?

I need to compare 2 strings and calculate their similarity, to filter down a list of the most similar strings. e.g. searching for "dog" would return dog doggone bog fog foggy e.g. searching for "crack" would…
Robin Rodricks
  • 110,798
  • 141
  • 398
  • 607
36
votes
7 answers

Python implementation of the Wilson Score Interval?

After reading How Not to Sort by Average Rating, I was curious if anyone has a Python implementation of a Lower bound of Wilson score confidence interval for a Bernoulli parameter?
Jeff Bauer
  • 13,890
  • 9
  • 51
  • 73
35
votes
1 answer

Hot content algorithm / score with time decay

I have been reading + researching on algorithms and formulas to work out a score for my user submitted content to display currently hot / trending items higher up the list, however i'll admit i'm a little over my head here. I'll give some background…
Paul Hinett
  • 1,951
  • 2
  • 26
  • 40
30
votes
6 answers

Ranking with millions of entries

I'm working on a server for an online game which should be able to handle millions of players. Now the game needs leaderboards and wants to be able to show a players current position and possibly other players near the current players position as…
Naatan
  • 3,424
  • 4
  • 32
  • 51
30
votes
4 answers

"Most popular" GROUP BY in LINQ?

Assuming a table of tags like the stackoverflow question tags: TagID (bigint), QuestionID (bigint), Tag (varchar) What is the most efficient way to get the 25 most used tags using LINQ? In SQL, a simple GROUP BY will do: SELECT Tag, COUNT(Tag)…
tags2k
  • 82,117
  • 31
  • 79
  • 106
29
votes
8 answers

Percentage rank of matches using Levenshtein Distance matching

I am trying to match a single search term against a dictionary of possible matches using a Levenshtein distance algorithm. The algorithm returns a distance expressed as number of operations required to convert the search string into the matched…
user1368587
  • 321
  • 1
  • 3
  • 5
26
votes
3 answers

Normalizing the edit distance

I have a question that can we normalize the levenshtein edit distance by dividing the e.d value by the length of the two strings? I am asking this because, if we compare two strings of unequal length, the difference between the lengths of the two…
25
votes
3 answers

Comparison-based ranking algorithm

I would like to rank or sort a collection of items (with size potentially greater than 100,000) where items in the collection have no intrinsic (comparable) value, instead all I have is the comparisons between any two items which have been provided…
driangle
  • 11,601
  • 5
  • 47
  • 54
24
votes
5 answers

Search ranking/relevance algorithms

When developing a database of articles in a Knowledge Base (for example) - what are the best ways to sort and display the most relevant answers to a users' question? Would you use additional data such as keyword weighting based on whether previous…
Tom
  • 5,835
  • 4
  • 25
  • 30
1
2 3
99 100