2

I was using this SQL statement:

SELECT "dateId", "userId", "Salary" 
FROM (
   SELECT *, 
          (row_number() OVER (ORDER BY "userId", "dateId"))%2 AS rn 
   FROM user_table
 ) sa 
 WHERE sa.rn=1 
   AND "userId" = 789 
   AND "Salary" > 0;

But every time the table gets new rows the result of the query is different.
Am I missing something?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Timka
  • 1,769
  • 1
  • 20
  • 27
  • You need to tell us what you want to do and what a "new row" looks like, especially the "userId" and "dateId" you use for ORDER BY in the window function. – Erwin Brandstetter Sep 22 '11 at 21:21
  • I want to optimize this query to give me the same results everytime I'm updating the table. dateId - just decimal replresentation of the date – Timka Sep 23 '11 at 02:16
  • Changed "dateId" changes the sort order in your window function. Hence, changes in the result are to be expected. – Erwin Brandstetter Sep 23 '11 at 11:53
  • How can I avoid this? I was thinking that adding "dateId" to the bottom of the table shouldn't affect the sort order... – Timka Sep 23 '11 at 14:09

3 Answers3

5

Assuming that ("dateId", "userId") is unique and new rows always have a bigger (later) dateId.

After some comments:

What I think you need:

SELECT "dateId", "userId", "Salary"
FROM (
   SELECT "dateId", "userId", "Salary"
         ,(row_number() OVER (PARTITION BY "userId"   -- either this
                              ORDER BY "dateId")) % 2 AS rn
   FROM   user_table
   WHERE  "userId" = 789                              -- ... or that
   ) sub
WHERE  sub.rn = 1
AND    "Salary" > 0;

Notice the PARTITION BY. This way you skip every second dateId for each userId, and additional (later) rows don't change the selection so far.

Also, as long as you are selecting rows for a single userId (WHERE "userId" = 789), pull the predicate into the subquery, achieving the same effect (stable selection for a single user). You don't need both.

The WHERE clause in the subquery only works for a single user, PARTITION BY works for any number of users in one query.

Is that it? Is it?
They should give me "detective" badge for this.
Seriously.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you for your reply. My "dateId" is a foreign key(unique). I'm going to try with some "sequenceId" but doubt that it will make a difference – Timka Sep 23 '11 at 17:58
  • "dateId" may be unique in the referenced table, but can still be referenced multiple times in user_table. You are aware of that, right? Try my little test for dupes, then you know for sure. – Erwin Brandstetter Sep 23 '11 at 18:46
  • I run the test and got nothing in results. The way I created the table is that particular user+ particular date can only exist once in the db – Timka Sep 26 '11 at 14:36
  • @TimGL: Can you tell me what the intention behind this code is: `(row_number() OVER (ORDER BY "userId", "dateId"))%2 AS rn ... WHERE sa.rn=1` I suggest you edit your question with that information. I suspect I can solve your problem, once I understand what this is supposed to achieve. Also, I still wonder if my assumption about "userId" is correct (see start of my answer). – Erwin Brandstetter Sep 26 '11 at 19:03
  • Brandstetter: You are right about "userId" - it is incremental. – Timka Sep 26 '11 at 20:46
  • The reason I'm doing %2 is because I need every 3rd record for my calculations(stddev).Ultimately I want to be able to query table with different time frames.I want to get 1,3,5,7 rows, then 3 days table gets updates(+8,9,10) and the next time I'm running this query I should be able to get 1,3,5,7,9 rows and not 2,4,6,8,10 – Timka Sep 26 '11 at 20:54
  • You are aware that %2 gives every **2nd** row? Try this demo: `select x, x%2 from generate_series(1,20,1) x`. I am having a hard time to understand the rest of your comment. – Erwin Brandstetter Sep 26 '11 at 21:11
  • It is every 2nd row but every 3rd day:) I'm sorry for the broken English here. What I meant was that I need my results be the same no matter how many rows I'm querying. – Timka Sep 27 '11 at 02:33
0

No that seems to be OK. You have new rows, those rows change the old rows to appear on different position after sorting.

Szymon Lipiński
  • 27,098
  • 17
  • 75
  • 77
0

If someone insert a new row with a userId below 789 the order will change. For example, if you have:

userId rn
 1      1
 4      0
 5      1
 6      0

and you insert a row with userId = 2, the rn will change:

userId rn
 1      1
 2      0
 4      1
 5      0
 6      1

In order to select every Nth row you need a column with a sequence or a timestamp.

DavidEG
  • 5,857
  • 3
  • 29
  • 44
  • This is my table: "dateId", "userId", "Salary" and every day I'm updating the table with the new date(sequence, decimal), userId stays the same and Salary changes: - 15234, 789, 32 - 15237, 789, 35 - 15238, 789, 36 dateId - is a sequence, but doesn't includet weekends and holidays. – Timka Sep 23 '11 at 02:13