2

I have a question about SQL, and I honestly tried to search methods before asking. I will give an abstract (but precise) description below, and will greatly appreciate your example of solution (SQL query).

What I have:

  1. Table A with category ids of the items and prices (in USD) for each item. category id has int type of value, price is string and looks like "USD 200000000" (real value is multiplied by 10^7). Tables also has a kind column with int type of value.

  2. Table B with relation of category id and name.

What I need:

Get a table with price diapasons (like 0-100 | 100-200 | ...) as column names and count amount of items for each category id (as lines names) in all of the price diapasons. All results must be filtered by kind parameter (from table A) with value 3.

Questions, that I encountered (and which caused to ask for an example of SQL query):

  1. Cut "USD from price string value, divide it by 10^7 and convert to float.
  2. Gather diapasons of price values (0-100 | 100-200 | ...), with given step in the given interval (max price is considered as unknown at the start). Example: step 100 on 0-500 interval, and step 200 for values >500.
  3. Put diapasons of price values into column names of the result table.
  4. For each diapason, count amount of items in each category (category_id). Left limit of diapason shall not be considered (e.g. on 1000-1200 diapason, items with price 1000 shall not be considered).
  5. Using B table, display name instead of category id.

Response is appreciated, ignorance will be understood.

  • 2
    Which DBMS are you using? – Alex Zen Mar 24 '18 at 20:52
  • 1
    lets imagine, that it is PostgreSQL (no offense, I just can`t tell it, but the syntax of query is expected to be the same as described from the beginning of this article - https://www.w3schools.com/sqL/default.asp ) – Anton FromButovo Mar 24 '18 at 21:12

2 Answers2

1

If you only need category ids, then you do not need B. What you are looking for is conditional aggregation, something like:

select category_id,
       sum(case when cast(substring(price, 4, 100) as int)/10000000 < 100 then 1 else 0 end) as price_000_100
       sum(case when cast(substring(price, 4, 100) as int)/10000000 >= 100 and cast(substring(price, 4, 100) as int)/10000000 < 200
                then 1 else 0
           end) as price_100_200,
       . . .                 
from a
group by category_id
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Gordon, many thanks for your suggestion! But in this case, I will need to write an undefined amount of "sum" lines. Can I automate generation of diapasons based on the above-mentioned condition? Let me know, if my description lacks something. – Anton FromButovo Mar 24 '18 at 21:44
  • 1
    @AntonFromButovo: you can't have an "undefined" amount of columns in a SQL query. The number of columns must be defined _before_ the query is executed –  Mar 25 '18 at 06:02
  • @a_horse_with_no_name Potentially possible with SQL Server using STUFF, PIVOT and dynamic SQL - see https://stackoverflow.com/questions/15931607/convert-rows-to-columns-using-pivot-in-sql-server – Alan Mar 26 '18 at 08:12
  • 1
    @Alan: with dynamic SQL you still don't get rid of that limitation. The only difference is that you first calculate the number of columns, then _generate_ a statement where that number is known when you execute it. –  Mar 26 '18 at 08:13
1

There is no standard way to do what you describe.

That is because to do (3) you need a pivot aka crosstab, and this is not in ANSI SQL. Each DBMS has it's own implementation. Plus dynamic columns in a pivot table are an additional complication.

For example, Postgres calls it a "crosstab" and requires the tablefunc module to be installed. See this SO question and the documentation. Compare to SQL Server, which uses the PIVOT command.

You can get close using reasonably standard SQL.

Here is an example based on SQLite. A little bit of conversion would provide a solution for other systems, e.g. SUBSTR would be substring(string [from int] [for int]) in postgre.

Assuming a data table of format:

data table

and a category name table of:

category name table

then the following code will produce:

output as list

WITH dataCTE AS
    (SELECT product_id AS 'ID', CAST(SUBSTR(price, 5) AS INT)/1000000 AS 'USD',
    CASE WHEN (CAST(SUBSTR(price, 5) AS INT)/1000000) <= 500 THEN
        100 ELSE 200
        END AS 'Interval'
    FROM data
    WHERE kind = 3),
groupCTE AS
    (SELECT dataCTE.ID AS 'ID', dataCTE.USD AS 'USD', dataCTE.Interval AS 'Interval',
    CASE WHEN dataCTE.Interval = 100 THEN 
        CAST(dataCTE.USD AS INT)/100
    ELSE
        (CAST(dataCTE.USD-500 AS INT)/200)+5
    END AS 'GroupID'
    FROM dataCTE),
cleanCTE AS
    (SELECT *, CASE WHEN groupCTE.Interval = 100 THEN
        CAST(groupCTE.GroupID *100 AS VARCHAR) 
        || '-' || 
        CAST((groupCTE.GroupID *100)+99 AS VARCHAR)
    ELSE
        CAST(((groupCTE.GroupID-5)*200)+500 AS VARCHAR) 
        || '-' || 
        CAST(((groupCTE.GroupID-5)*200)+500+199 AS VARCHAR) 
    END AS 'diapason'
    FROM groupCTE
    INNER JOIN cat_name AS cn ON groupCTE.ID = cn.cat_id)
SELECT *
FROM cleanCTE;

If you modify the last SELECT to:

SELECT name, diapason, COUNT(diapason)
FROM cleanCTE
GROUP BY name, diapason;

then you get a grouped output:

output as group

This is as close as you will get without specifying the exact system; even then you will have a problem with dynamically creating the column names.

Alan
  • 2,914
  • 2
  • 14
  • 26