1

My table has values like (RowCount is generated by the query below):

ID       Date_trans   Time_trans  Price  RowCount
-------  -----------  ----------  -----  --------
1699093  22-Feb-2011  09:30:00    58.07  1
1699094  22-Feb-2011  09:30:00    58.08  1
1699095  22-Feb-2011  09:30:00    58.08  2
1699096  22-Feb-2011  09:30:00    58.08  3
1699097  22-Feb-2011  09:30:00    58.13  1
1699098  22-Feb-2011  09:30:00    58.13  2
1699099  22-Feb-2011  09:30:00    58.12  1
1699100  22-Feb-2011  09:30:08    58.13  3
1699101  22-Feb-2011  09:30:09    57.96  1
1699102  22-Feb-2011  09:30:09    57.95  1
1699103  22-Feb-2011  09:30:09    57.93  1
1699104  22-Feb-2011  09:30:09    57.96  2
1699105  22-Feb-2011  09:30:09    57.93  2
1699106  22-Feb-2011  09:30:09    57.93  3
1699107  22-Feb-2011  09:30:37    58     1
1699108  22-Feb-2011  09:30:37    58.08  4
1699109  22-Feb-2011  09:30:38    58.08  5
1699110  22-Feb-2011  09:30:41    58.02  1
1699111  22-Feb-2011  09:30:41    58.02  2
1699112  22-Feb-2011  09:30:41    58.01  1
1699113  22-Feb-2011  09:30:41    58.01  2
1699114  22-Feb-2011  09:30:41    58.01  3
1699115  22-Feb-2011  09:30:42    58.02  3
1699116  22-Feb-2011  09:30:42    58.02  4
1699117  22-Feb-2011  09:30:45    58.04  1
1699118  22-Feb-2011  09:30:54    58     2
1699119  22-Feb-2011  09:30:57    58.05  1

The ID column is an IDENTITY column.
And I'm using this query to get the consecutive row count as:

  SELECT   ID, Date_trans, Time_trans, Price
          ,ROW_NUMBER() OVER(PARTITION BY Price  ORDER BY ID) RowCount
  FROM     MyTable
  ORDER    BY ID;

The RowCount I get is right for most of the values but wrong for some values. For instance:

  • ID 1699100 Price 58.13 – count should be 1 (showing 3).
  • ID 1699104 Price 57.96 – count should be 1 (showing 2).
  • ID 1699105, 1699106 Price 57.93 – count should be 1, 2 (showing 2, 3).

I have tried the same query in PostgreSQL and found the same results.
I have uploaded a csv data sample here.

I'm stuck with such unexpected results of partition. Can anybody help me?

Andriy M
  • 76,112
  • 17
  • 94
  • 154
Mainuddin
  • 416
  • 5
  • 15
  • I don't think you quite understand analytic queries. The price of ID 1699100 is 58.13 and it is the third highest ID with this price so the answer is correct. What are you trying to do? Count the number of rows with a specific price? – Ben Aug 05 '12 at 19:13
  • 1
    I think the `58.0` in `ID 1699100 Price 58.0 – count should be 1 (showing 3);` should actually read `58.13` (or the prices in rows 1699097, 1699098, 1699100 should be `58.0` instead of `58.13`). – Andriy M Aug 05 '12 at 19:42
  • I took the liberty to fix the example in the question according to @AndriyM's findings. Please correct, if I am wrong. – Erwin Brandstetter Aug 05 '12 at 23:46

4 Answers4

4

The PARTITION BY clause of the ROW_NUMBER() function instructs it to partition the entire row set by Price values and assign row numbers in the ascending order of IDs.

It seems like you want to distinguish between any two groups of rows with identical Price values that are separated by at least one row with a different Price.

There may be various ways to achieve that. In SQL Server (and I think the same would work in PostgreSQL too), I would first use two ROW_NUMBER() calls to get an additional partitioning criterion, then rank rows once again using that criterion, like this:

WITH partitioned AS (
  SELECT
    ID,
    Date_trans,
    Time_trans,
    Price,
    ROW_NUMBER() OVER (                   ORDER BY ID) -
    ROW_NUMBER() OVER (PARTITION BY Price ORDER BY ID) AS PriceGroup
  FROM MyTable
)
SELECT
  ID,
  Date_trans,
  Time_trans,
  Price,
  ROW_NUMBER() OVER (PARTITION BY Price, PriceGroup ORDER BY ID) AS RowCount
FROM partitioned
ORDER BY ID
;

Here's a SQL Fiddle demo.

Andriy M
  • 76,112
  • 17
  • 94
  • 154
2

Pure SQL

WITH x AS (
    SELECT id, date_trans, time_trans, price
         ,(price <> lag(price) OVER (ORDER BY id))::int AS step
    FROM   tbl
    )
    ,y AS (
    SELECT *, sum(step) OVER (ORDER BY id) AS grp
    FROM   x
    )
SELECT id, date_trans, time_trans, price
      ,row_number() OVER (PARTITION BY grp ORDER BY id) As row_ct
FROM   y
ORDER  BY id;

The logic:

  1. Remember when the price changes compared to the last row in step. (Special case of first row works, too.)
  2. Sum up steps, so that identical prices in sequence end up in the same group grp.
  3. Number rows per group.

Honestly, I think @Andriy's solution is a wee bit more elegant. It needs three window functions, too, but can do it in only two query steps. In a quick test on the small sample it was also slightly faster. So, +1 from me.

If performance is of the essence, a more specialized solution with a

PL/pgSQL function

should be considerably faster, because it only needs to scan and order the table once.

CREATE OR REPLACE FUNCTION f_my_row_ct()
  RETURNS TABLE (
    id         int
   ,date_trans date
   ,time_trans time
   ,price      numeric
   ,row_ct     int
  ) AS
$BODY$
DECLARE
   _last_price numeric;   -- remember price of last row
BEGIN

FOR id, date_trans, time_trans, price IN 
   SELECT t.id, t.date_trans, t.time_trans, t.price
   FROM   tbl t
   ORDER  BY t.id
LOOP
   IF _last_price = price THEN   -- works with 1st row, too
      row_ct := row_ct + 1;
   ELSE
      row_ct := 1;
   END IF;

   RETURN NEXT;
   _last_price = price;   -- remember last price
END LOOP;

END;
$BODY$  LANGUAGE plpgsql;

Call:

SELECT * FROM f_my_row_ct()

In another quick test on the small sample this was 3-4x faster. Test with EXPLAIN ANALYZE to see.


As an aside: you could simplify your table (and queries) and save some bytes of storage by merging date_trans date and time_trans time into ts_trans timestamp.

It's very simple and very fast to extract date or time from a timestamp with a cast:

ts_trans::date
ts_trans::time

The manual about date/time types.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • A bunch of good points here, wish I could vote this more than once. – Andriy M Aug 06 '12 at 04:48
  • Thanks a lot. Actually I was looking for the solution in MS SQL; Just tested the result in PostgreSQL. – Mainuddin Aug 07 '12 at 05:02
  • @Mainuddin: You are aware that you tagged the question [PostgreSQL] and not MS SQL? – Erwin Brandstetter Aug 07 '12 at 07:55
  • @Mainuddin: Perhaps you thought that the `sql` tag meant "MS SQL". That is not so. *SQL* is the name of a language used in many database products, and the "MS SQL" thing is more correctly called *SQL Server*. We use the `sql-server` tag for questions related to that product. Please keep that in mind for the future. – Andriy M Aug 07 '12 at 11:35
  • Yes, I was looking for `sql server` but did not find then and thought `sql` to be `sql server`. Thanks. – Mainuddin Aug 09 '12 at 07:10
0
  • 1699100 Price 58.0 - is showing 3 because 1699097,8 are 1,2

  • 1699104 Price 57.96 – is showing 2 because 1669101 is 1.

  • 1699105, 1699106 Price 57.93 – showing 2, 3, because 1699103 is 1

If you want to find items of the same value in a sequence, one option is to join the data to the previous ID and see if the values are the same

podiluska
  • 50,950
  • 7
  • 98
  • 104
0

From what I can gather by your expections of results, you need to partition over Time_trans too:

  SELECT   ID, Date_trans, Time_trans, Price
           ,ROW_NUMBER() OVER(PARTITION BY Time_trans, Price ORDER BY ID) RowCount
  FROM     MyTable
  ORDER BY ID

I believe this is the case as your expecting the ROW_NUMBER to start again when the Time-trans value changes as you progress through the data.

Also you might want to add Date_trans in there too if there could be multiple dates in the table, which I would expect.

XN16
  • 5,679
  • 15
  • 48
  • 72