1

I have a table that looks like this:

AMT       TYPE
100       red, yellow, green, purple
200       red, green, blue, yellow
90        pink, blue, light red
......

What I want is to add up the amounts by color. For example, a final output should be:

AMT       TYPE
300       red
300       yellow
300       green
290       blue
190       other

Notice that 1. I don't want to include light red in red and 2. I want to include all colors other than red, yellow, green, blue in to a new category "other".

My current code is

select sum(red), ... from (
      select
        case when trim(regexp_substr(type, red',1,1,'i')) is not null
             then amt
        else 0 end as red
        ......
      from mytable)

But it doesn't solve the problems I mentioned earlier. I also tried the following method, but it becomes so slow that it never ends. (Or maybe there is some error in this code?)

select color, sum(amt)
from (
     select trim(regexp_substr(type,'[^,]+', 1, level)) as color
     from mytable
     connect by level <= regexp_count(type, ',')+1)
group by color

How can I solve this?

Thank you!

lareven
  • 379
  • 2
  • 15
  • Now you know how unwieldy a schema becomes when it does not follow [1NF](https://en.wikipedia.org/wiki/First_normal_form). I could consider creating a (temporary) pair of tables that describe the relationship between amounts and colors in a proper relational fashion, then write an obvious query. – 9000 Jun 26 '14 at 21:18
  • Exactly... But I have no control over the database... Could you describe how can I create a temp table? – lareven Jun 26 '14 at 21:27
  • http://stackoverflow.com/questions/2671518/how-to-create-a-temporary-table-in-oracle – 9000 Jun 26 '14 at 21:31
  • Sorry for the confusion, but I asking how to create that relation between amounts and colors... – lareven Jun 26 '14 at 21:54
  • You don't have to explicitly create a relation, since you don't have a well-formed color column in your table. Normally you would use a 'foreign key' constraint. See my answer below for a possible solution. – 9000 Jun 26 '14 at 22:22

2 Answers2

0

Do you have a full list of colors?

Imagine that you do. Let there be a table color(name varchar2(...) not null primary key) that lists every color.

Then you could write something like:

select 
  color.name, sum(crazy_table.amt)
from
  color, crazy_table -- the latter is your original data
where
  crazy_table.type like '%, ' || color.name -- at the end of string
  or 
  crazy_table.type like color.name || ', %' -- at the start of string
  or  crazy_table.type like '%, ' || color.name || ', %' -- middle
  or
  crazy_table.type = color.name -- single entry, no commas
  or
  color.name = 'other' and not exists ( -- no known color matches
    select 1 from color as c2
    where instr(crazy_table.type, c2.name) > 0
  )
group by color.name

It would full-scan crazy_table, which is presumably large, doing index lookups on color table which is presumably much smaller, so performance should be ok.

9000
  • 39,899
  • 9
  • 66
  • 104
0

The following query would get you the desired result. It

1.first splits the color values into separate rows, based on 737905's solution given to convert comma separated string into rows on OTN Discussion Forum.

2.uses the CASE statement to designate the 'other' category for colors besides red, yellow, green, and blue.

3.groups by color

4.orders by a pre-determined order of colors

WITH SPLIT_COLORS AS
  (
    SELECT
      AMT,
      TRIM(EXTRACT(column_value,'/e/text()')) COLOR
    FROM
      mytable x,
      TABLE(XMLSEQUENCE(EXTRACT(XMLTYPE('<ROW><e>'
      ||REPLACE(type,',','</e><e>')
      ||'</e></ROW>'),'//e')))
  )
SELECT
  CASE
    WHEN color NOT IN ('red', 'yellow', 'green', 'blue') THEN 'other'
    ELSE color
  END AS color,
  SUM(amt) amt
FROM
  SPLIT_COLORS
GROUP BY
  CASE
    WHEN color NOT IN ('red', 'yellow', 'green', 'blue') THEN 'other'
    ELSE color
  END
ORDER BY
  CASE color
    WHEN 'red' THEN 1
    WHEN 'yellow' THEN 2
    WHEN 'green' THEN 3
    WHEN 'blue' THEN 4
    ELSE 5
  END;

You can test only the output of the first part (the CTE's - Common Table Expressions), as below:

WITH SPLIT_COLORS AS
  (
    SELECT
      AMT,
      TRIM(EXTRACT(column_value,'/e/text()')) COLOR
    FROM
      mytable x,
      TABLE(XMLSEQUENCE(EXTRACT(XMLTYPE('<ROW><e>'
      ||REPLACE(type,',','</e><e>')
      ||'</e></ROW>'),'//e')))
  )
SELECT *
FROM SPLIT_COLORS;
Joseph B
  • 5,519
  • 1
  • 15
  • 19