Questions tagged [ranking-functions]

Ranking functions return a ranking value for each row in a partition. Depending on the function that is used, some rows might receive the same value as other rows. Ranking functions are non-deterministic.

193 questions
20
votes
3 answers

How to add row number column in SQL Server 2012

I'm trying to add a new column to an existing table, where the value is the row number/rank. I need a way to generate the row number/rank value, and I also need to limit the rows affected--in this case, the presence of a substring within a…
Michael Carper
  • 331
  • 2
  • 4
  • 10
17
votes
4 answers

DataFrame: add column whose values are the quantile number/rank of an existing column?

I have a DataFrame with some columns. I'd like to add a new column where each row value is the quantile rank of one existing column. I can use DataFrame.rank to rank a column, but then I don't know how to get the quantile number of this ranked…
luca
  • 7,178
  • 7
  • 41
  • 55
15
votes
1 answer

Efficiently determine "how sorted" a list is, eg. Levenshtein distance

I'm doing some research on ranking algorithms, and would like to, given a sorted list and some permutation of that list, calculate some distance between the two permutations. For the case of the Levenshtein distance, this corresponds to calculating…
13
votes
3 answers

SQL over clause - dividing partition into numbered sub-partitions

I have a challenge, that I've come across at multiple occasions but never been able to find an efficient solution to. Imagine I have a large table with data regarding e.g. bank accounts and their possible revolving moves from debit to…
12
votes
1 answer

Calculating SQL Server ROW_NUMBER() OVER() for a derived table

In some other databases (e.g. DB2, or Oracle with ROWNUM), I can omit the ORDER BY clause in a ranking function's OVER() clause. For instance: ROW_NUMBER() OVER() This is particularly useful when used with ordered derived tables, such as: SELECT…
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
8
votes
2 answers

Algorithm/formula to calculate Product ranking on a ecommerce website(Based on following criteria)

I have been Working on algorithms and formulas to find out a score for the products available on my ecommerce website.Basically, I want to calculate some kind of score to rank the products when a user searches it. I'll give some background on the…
user4115825
7
votes
4 answers

Rank array values with potential duplicate values and skipping some positions if there is a tie

I am working with database data that manipulates college students exam results. Basically, I am pulling the records from a MySQL database and pulling one class at any given time. I want to rank the students with the highest performer given the rank…
Bululu
  • 555
  • 2
  • 8
  • 14
6
votes
1 answer

Python Pandas qcut behavior with # of observations not divisible by # of bins

Suppose I had a pandas series of dollar values and wanted to discretize into 9 groups using qcut. The # of observations is not divisible by 9. SQL Server's ntile function has a standard approach for this case: it makes the first n out of 9 groups…
AllenQ
  • 1,659
  • 2
  • 16
  • 18
6
votes
2 answers

Finding repeated occurrences with ranking functions

Please help me generate the following query i've been struggling with for some time now. Lets' say I have a simple table with month number and information whether there were any failed events in this particular month Below a script to generate…
5
votes
1 answer

Windowing function in Hive

I am exploring windowing functions in Hive and I am able to understand the functionalities of all the UDFs. Although, I am not able to understand the partition by and order by that we use with the other functions. Following is the structure that is…
5
votes
1 answer

Is NDCG (normalized discounted gain) flawed? I have calculated a few alternative ranking quality measures, and I can't make heads or tails of it

I'm using python for a learning-to-rank problem, and I am evaluating my success using the following DCG and NDCG code (from http://nbviewer.ipython.org/github/ogrisel/notebooks/blob/master/Learning%20to%20Rank.ipynb ) def dcg(relevances, rank=20): …
5
votes
1 answer

What is the difference between point-wise and pair-wise ranking in machine learning

I've been reading papers on pairwise ranking and this is what I don't get: what is the difference in the training/testing data between pointwise and pairwise ranking? This is the paper that I have been…
Martin Klosi
  • 3,098
  • 4
  • 32
  • 39
4
votes
4 answers

Scalable time decay for web application

My goal here is to generate a system similar to that of the front page of reddit. I have things and for the sake of simplicity these things have votes. The best system I've generated is using time decay. With a halflife of 7 days, if a vote is…
Piper Merriam
  • 2,774
  • 2
  • 24
  • 30
4
votes
1 answer

postgresql Select all rows which have max value in one column

I'm new to postgresql, I have a table T like this in postgresql: C1 C2 C3 C4 ID C1,C2, C4 are int. C3 is a char C4 is essentially the version number. and can be anything between 1-N. ( For example, after one set of insert events the values in table…
DJ_Stuffy_K
  • 615
  • 2
  • 11
  • 29
4
votes
1 answer

Variable rank that doesn't skip places in R

I have a vector that looks something like this: > vec [1] 1 1 2 5 10 I'm trying to convert this to a form of non-olympic ranking where ties are still ties, but the ranks are guaranteed 1 apart, even when there are multiple ties above it, so: >…
Mittenchops
  • 18,633
  • 33
  • 128
  • 246
1
2 3
12 13