-1

I want to regularly update the rows in a database but also want more popular rows (rows that are requested more) to be updated more often. I thought I could base this on two parameters: popularity and time since the last update.

id | title | popularity | last_updated
---+-------+------------+-------------
1  | A     | 148        | 20190501
2  | B     | 3874       | 20190201
3  | C     | 29         | 20190104
4  | D     | 242        | 20190314
5  | E     | 398        | 20190325
6  | F     | 3          | 20190128

The last_updated column is formated as YYYYMMDD.

I first thought I could simply use ORDER BY:

SELECT * FROM results ORDER BY last_updated, popularity;

However, this query will only care about the last_updated column if the value of two or more rows' popularity is the same.

Would it be possible to e.g. turn popularity and last_updated into another value that can be used for this kind of sorting?

David
  • 33
  • 1
  • 6
  • Please show your sample data and expected output for that data. – Pham X. Bach May 27 '19 at 09:51
  • It's not clear how last_updated comes into play here, maybe it should be part of a WHERE condition instead? – Joakim Danielson May 27 '19 at 10:02
  • @JoakimDanielson I've updated my question with some additional information. Since I want to make sure all rows are regularly updated, rows that haven't been updated in a while should have a higher probability of being updated. However, I also want more popular rows to be updated more often compared to less popular. E.g. a row with the popularity of 1000 should be updated more often than a row with the popularity of 1. – David May 27 '19 at 10:09
  • When you say updated, you don't mean updated in the database, right? You mean refreshed on a UI somewhere? – codeulike May 27 '19 at 10:36
  • @codeulike No, I mean updated in the database. I regularly update the rows to make sure their information is up-to-date (I forgot to post an example of this in the description, sorry). More popular rows and rows that haven't been updated in a while should be prioritized. – David May 27 '19 at 10:50
  • It is possible -- in a zillion different ways. You need to specify what you want the balance to look like. What is the relationship between `last_updated` and `popularity`? Is that supposed to be the last time someone votes on a page as being "popular"? – Gordon Linoff May 27 '19 at 11:22
  • @GordonLinoff Let's say the titles are parsed from a 3rd party web page. The titles are constantly updated so the database needs to regularly update its rows with the new information. Every time one of the rows is updated, the row's `last_updated` is set to the current date. `popularity` is the number of times a certain row has been requested. Popular rows should be updated more often than less popular rows but at the same time, the less popular rows can't be ignored completely (they shouldn't just get updated as often). – David May 27 '19 at 11:39

3 Answers3

0

Difficulty will be to define the level of importance between the popularity and last update.

How is the popularity value calculated ? Will it ever decrease ? Most subjects are popular for while (hipes) and become quick old news. To know the current popularity of a record, I suggest to keep an extra column popularityOnUpdate which holds the popularity value at the moment of the last update. When the popularity decreases the increment of the popularity value will fade out and so the difference between the popularityOnUpdate value and the popularity value.

The query can become:

order by POP_WEIGHT*(popularity - popularityOnUpdate)
       + UPD_WEIGHT*(current date - lastUpdate) DESCENDING

POP_WEIGHT and UPD_WEIGHT are constants you have to define which express the importance of the 2 criteria (popularity and last update). Once the popularity slows down, the last update value will push the record up in the ranking.

Conffusion
  • 4,335
  • 2
  • 16
  • 28
0

There are of course many ways to select a formula for bringing popularity and last_update together into a single value to order by. So this is more of a template than a ready answer

To just use popularity we can make use of the random() function to add a factor to the popularity

SELECT * 
FROM results 
ORDER BY popularity * random() DESC

For the date it makes sense to subtract it from today and then use random() on that result so if we add that

SELECT * 
FROM results 
ORDER BY popularity * random() - (current_date - last_updated) * random() DESC

As pointed out in a comment I might have misunderstood the priority of last_update so if older updates should be more likely to be updated then switch the sign

SELECT * 
FROM results 
ORDER BY popularity * random() + (current_date - last_updated) * random()

so now we subtract the days from the last update so that newly updated gets a higher value but again we use random() so that the order isn't given. This could serve as a start but I am not sure if the balance between the two parts is good or if perhaps one of them needs to be multiplied by some factor depending on how popularrity vs last_updated should affect the result.

Then one could also add a LIMIT to the SELECT and perhaps even let that value to be multiplied by random()

Joakim Danielson
  • 43,251
  • 5
  • 22
  • 52
  • I think it should be `last_updated - current_date` rather than `current_date - last_updated`. Otherwise, it'll prioritize rows that were just updated. – David May 27 '19 at 12:41
  • @David I think you are right. I have prioritised recently updated but it should probably be the other way around. Of course rather than switching the order of the subtraction the result could be added instead of subtracted from the popularity value. – Joakim Danielson May 27 '19 at 12:44
  • One last question: is it possible to make the popularity relevant to the value in the other rows? Thus, if a certain row is really popular (e.g. has a popularity of 1000 while the other rows only have around 0 - 100) it will, at the moment, always be in the top. – David May 27 '19 at 13:11
  • @David This is why I wrote that this is more of a template and it needs more work. If you look at the math functions in postgresql you see that for instance there is a log() function and using this on popularity would compress the range since log(1000) is 3 and log(100) is 2 and so on. There might be [others](https://www.postgresql.org/docs/9.5/functions-math.html) that are interesting. – Joakim Danielson May 27 '19 at 13:23
  • @David . . . Wow. I fail to see out adding random numbers to the popularity does anything useful. – Gordon Linoff May 27 '19 at 15:33
-1

Your Query is true

SELECT * FROM results ORDER BY last_updated, popularity;

It will be sorted by last_updated Ascending, If you have duplicated values of last_updated column it will be sorted by popularity

Just see this answer and you will understand me SQL multiple column ordering