1

Good Morning! I am looking for a trick to maintain a unique list of keys where a one-to-many relationship can occur.

Problem

I am working with a horribly non-normalized database at work and unfortunately redesign is out of the question. I have a 1NF master table which contains a number of transitive and partial-key dependencies similar to this:

Cmpd_Key  Group  Subgroup   Group_Desc
  A1    |   A   |    1   |    Same
  A2    |   A   |    2   |    Same
  B1    |   B   |    1   |    Same1
  B2    |   B   |    2   |    Same1
  C1    |   C   |    1   |    Diff1
  C2    |   C   |    2   |    Diff2  <---This field contains multiple values

I often need to pull a unique list of Group IDs but requirements usually demand the Group_Desc field as well. Unfortunately, due to poor data entry restrictions up-stream, this description field can contain multiple entries per Group which causes duplication as the Group field should be unique in most data pulls. For my purposes I don't really care which Group_Desc record I pull as long as I can maintain a relationship of 1 Group to 1 Group_Desc.

I've come up with an ugly solution that I refer to as an Inline View whenever I need to reference the Group_Desc field in a larger query but this kills my performance:

SELECT Group, Group_Desc
FROM Table t
WHERE Subgroup = (SELECT MIN(Subgroup)
                  FROM Table
                  WHERE Group = t.Group) --Nasty Correlated Subquery

Question

Does anyone have a performance-friendly trick for pulling back a single row of multiple values repeatedly within the same query? I'd like to be able to pull back Group and only the first Group_Desc that appears.

I'm envisioning something like this:

SELECT Group, Group_Desc
FROM Table t
GROUP BY Group, Group_Desc    
HAVING ROWNUM = [The lowest returned Rownum within the same Group]

A fellow developer mentioned the RANK function as a possible solution but I didn't see how I could use that to eliminate values.

Any help you can provide would be greatly appreciated!

----------------EDIT----------------------

So after some additional analysis, I was able to point to an omission in my original correlated subquery which caused an overly long execution plan. By adding a few additional predicates, the Optimizer was able to create a better plan which changed my execution time from around 12 to 2 minutes which is in line with my expectations.

I did experiment quite a bit with the Analytics solution that Ponder Stibbons suggested below. His solution is quite elegant and I have chosen as the answer for this question, however, I was unable to use it in this particular query as execution time was significantly slower than my original solution primarily due to an index I was able to utilize in my correlated subquery.

I have no doubt that in a fair comparison that the Analytics solution would run on par or better than the Correlated SubQuery solution. I appreciate everyone's assistance on this problem!

  • Would an order by clause not do the what you require? e.g. http://stackoverflow.com/questions/11978136/how-can-i-get-just-the-first-row-in-a-result-set-after-ordering – bob dylan May 06 '15 at 13:06
  • I like the idea but unfortunately no I cannot as the SubGroup number is not consistent... I listed it here for simplicity but in actuality it can be a variety of sequential numbers. –  May 06 '15 at 14:36

1 Answers1

1

You can use min in analytical version here, it is fast:

select 
    TGroup, 
    min(Group_Desc) over (partition by tgroup) 
  from t

SQLFiddle demo

first_value is also the option:

select TGroup,
    first_value(Group_Desc) over (partition by tgroup order by subgroup) gd
  from t
Ponder Stibbons
  • 14,723
  • 2
  • 21
  • 24
  • This is very interesting. Thanks! I haven't used Analytic Functions before and need to learn more about them. I think this may be the answer I'm looking for but off the cuff I'm not getting the same result set with your queries. Let me play around with these functions a bit and get back to you. –  May 06 '15 at 14:56
  • Yes, query 2 works like yours, query 1 takes min(group_desc). So for subgroups (1 'XYZ', 2 'ABC') first will take allways 'ABC', while second - allways 'XYZ'. You can use this function which suits you more. – Ponder Stibbons May 06 '15 at 15:21
  • Okay, so what I'm finding is that this does give me the appropriate field values I want but it does not eliminate duplicates. So when if I have two descriptions and use the OVER function, I get the correct value but I get it twice.... I'm trying to figure out now how to group results with an analytic function. –  May 06 '15 at 15:26
  • To eliminate duplicates simple use `distinct`. But as I understood, you want to display each row separately, but take description from other row, like in SQLFiddle (no need to add any levels with subqueries). – Ponder Stibbons May 06 '15 at 15:36
  • +1 Ah.. so simple I can't believe I missed it. I just want to eliminate the extra rows entirely so Distinct works well. I'm now able to repeat my original set but unfortunately it has actually increased my execution time of the query I'm using to test performance! Still love this solution though and I'm going to try to see if I can utilize some indexed fields to get it to perform better. –  May 06 '15 at 17:31
  • Is there possibilty that you update your question with (simplified) query which you are executing? By my experience using analytic functions speeds things drastically. I suspect that you use this `distinct` in subquery and by using function we can **eliminate** subqueries completely. – Ponder Stibbons May 06 '15 at 17:37
  • Apologize for the late reply but I was tied up late last week. I've added an edit to the problem as a summary to my findings but in short I think that poor performance is due to an index I'm utilizing in my correlated subquery that I cannot in your analytics solution. In 9/10 cases though I'd agree your solution is the best so I'm marking this as the answer. Very much appreciate the assistance! –  May 11 '15 at 13:19