Selecting every Nth row per user in Postgres

Question

I was using this SQL statement:

SELECT "dateId", "userId", "Salary" 
FROM (
   SELECT *, 
          (row_number() OVER (ORDER BY "userId", "dateId"))%2 AS rn 
   FROM user_table
 ) sa 
 WHERE sa.rn=1 
   AND "userId" = 789 
   AND "Salary" > 0;

But every time the table gets new rows the result of the query is different.
Am I missing something?

You need to tell us what you want to do and what a "new row" looks like, especially the "userId" and "dateId" you use for ORDER BY in the window function. — Erwin Brandstetter, Sep 22 '11 at 21:21
I want to optimize this query to give me the same results everytime I'm updating the table. dateId - just decimal replresentation of the date — Timka, Sep 23 '11 at 02:16
Changed "dateId" changes the sort order in your window function. Hence, changes in the result are to be expected. — Erwin Brandstetter, Sep 23 '11 at 11:53
How can I avoid this? I was thinking that adding "dateId" to the bottom of the table shouldn't affect the sort order... — Timka, Sep 23 '11 at 14:09

score 5 · Accepted Answer · edited Jun 20 '20 at 09:12

5

Assuming that ("dateId", "userId") is unique and new rows always have a bigger (later) dateId.

After some comments:

What I think you need:

SELECT "dateId", "userId", "Salary"
FROM (
   SELECT "dateId", "userId", "Salary"
         ,(row_number() OVER (PARTITION BY "userId"   -- either this
                              ORDER BY "dateId")) % 2 AS rn
   FROM   user_table
   WHERE  "userId" = 789                              -- ... or that
   ) sub
WHERE  sub.rn = 1
AND    "Salary" > 0;

Notice the PARTITION BY. This way you skip every second dateId for each userId, and additional (later) rows don't change the selection so far.

Also, as long as you are selecting rows for a single userId (WHERE "userId" = 789), pull the predicate into the subquery, achieving the same effect (stable selection for a single user). You don't need both.

The WHERE clause in the subquery only works for a single user, PARTITION BY works for any number of users in one query.

_{Is that it? Is it?

_{They should give me "detective" badge for this.

_Seriously.}}

edited Jun 20 '20 at 09:12

Community

1
1

answered Sep 23 '11 at 14:36

Erwin Brandstetter

605,456
145
1,078
1,228

Thank you for your reply. My "dateId" is a foreign key(unique). I'm going to try with some "sequenceId" but doubt that it will make a difference – Timka Sep 23 '11 at 17:58
"dateId" may be unique in the referenced table, but can still be referenced multiple times in user_table. You are aware of that, right? Try my little test for dupes, then you know for sure. – Erwin Brandstetter Sep 23 '11 at 18:46
I run the test and got nothing in results. The way I created the table is that particular user+ particular date can only exist once in the db – Timka Sep 26 '11 at 14:36
@TimGL: Can you tell me what the intention behind this code is: `(row_number() OVER (ORDER BY "userId", "dateId"))%2 AS rn ... WHERE sa.rn=1` I suggest you edit your question with that information. I suspect I can solve your problem, once I understand what this is supposed to achieve. Also, I still wonder if my assumption about "userId" is correct (see start of my answer). – Erwin Brandstetter Sep 26 '11 at 19:03
Brandstetter: You are right about "userId" - it is incremental. – Timka Sep 26 '11 at 20:46
The reason I'm doing %2 is because I need every 3rd record for my calculations(stddev).Ultimately I want to be able to query table with different time frames.I want to get 1,3,5,7 rows, then 3 days table gets updates(+8,9,10) and the next time I'm running this query I should be able to get 1,3,5,7,9 rows and not 2,4,6,8,10 – Timka Sep 26 '11 at 20:54
You are aware that %2 gives every **2nd** row? Try this demo: `select x, x%2 from generate_series(1,20,1) x`. I am having a hard time to understand the rest of your comment. – Erwin Brandstetter Sep 26 '11 at 21:11
It is every 2nd row but every 3rd day:) I'm sorry for the broken English here. What I meant was that I need my results be the same no matter how many rows I'm querying. – Timka Sep 27 '11 at 02:33

score 0 · Answer 2 · answered Sep 22 '11 at 21:17

0

No that seems to be OK. You have new rows, those rows change the old rows to appear on different position after sorting.

answered Sep 22 '11 at 21:17

Szymon Lipiński

27,098
17
75
77

score 0 · Answer 3 · answered Sep 22 '11 at 21:25

0

If someone insert a new row with a userId below 789 the order will change. For example, if you have:

and you insert a row with userId = 2, the rn will change:

In order to select every Nth row you need a column with a sequence or a timestamp.

answered Sep 22 '11 at 21:25

DavidEG

5,857
3
29
44

This is my table: "dateId", "userId", "Salary" and every day I'm updating the table with the new date(sequence, decimal), userId stays the same and Salary changes: - 15234, 789, 32 - 15237, 789, 35 - 15238, 789, 36 dateId - is a sequence, but doesn't includet weekends and holidays. – Timka Sep 23 '11 at 02:13

Selecting every Nth row per user in Postgres

3 Answers3

After some comments:

Linked