0

I have a table, each rows records are related other rows records, some rows are not related to other rows as well.

My records are like this

Item Table

ID          ItemLookupCode       UnitOfMeasure     ParentItem
--------------------------------------------------------------
111         100006C0005          CRT                 0
112         100006B0001          BAG                111       // this row is child of ID 111

221         100027C0002          CRT                 0
222         100027T0012          PCT                221
223         100027P0001          PC                 222

Expecting output

ItemRelation

I tried this following query, it's working fine. Is there any other better solution for performance:

SELECT DISTINCT 
    TOP (100) PERCENT dbo.Item.ID, 
    dbo.Item.ItemLookupCode, 
    dbo.Item.UnitOfMeasure, 
    Item_1.ID AS ChildID1, 
    Item_1.ItemLookupCode AS ChildItemLookupCode1, 
    Item_1.ParentItem AS ChildParentItem1, 
    Item_1.UnitOfMeasure AS ChildUOM1, 
    Item_2.ID AS ChildID2, 
    Item_2.ItemLookupCode AS ChildItemLookupCode2, 
    Item_2.UnitOfMeasure AS ChildUOM2, 
    Item_3.ID AS ChildID3, 
    Item_3.ItemLookupCode AS ChildItemLookupCode, 
    Item_3.UnitOfMeasure AS ChildUOM3
FROM           
    dbo.Item 
LEFT OUTER JOIN
    dbo.Item AS Item_1 ON dbo.Item.ID = Item_1.ParentItem 
LEFT OUTER JOIN
    dbo.Item AS Item_2 ON Item_1.ID = Item_2.ParentItem 
LEFT OUTER JOIN
    dbo.Item AS Item_3 ON Item_2.ID = Item_3.ParentItem 
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Liam neesan
  • 2,282
  • 6
  • 33
  • 72

2 Answers2

1

My guess is that there's no index on ParentItem. And with 3 joins on that field, the full table scans make it slow.
But ID is probably the primary key, so that's indexed.

If adding an index on ParentItem isn't an option?
Then you could go via a temporary table with an index on the parent.

CREATE TABLE #tmpItem (ParentID int, ID int);

INSERT INTO #tmpItem (ParentID, ID)
SELECT ParentItem, ID
FROM dbo.Item;

CREATE CLUSTERED INDEX #IDX_C_tmpItem ON #tmpItem(ParentID);

SELECT --TOP (100) PERCENT 
Item_0.ID AS ID, 
Item_0.ItemLookupCode AS ItemLookupCode, 
Item_0.UnitOfMeasure AS UnitOfMeasure, 
Item_1.ID AS ChildID1, 
Item_1.ItemLookupCode AS ChildItemLookupCode1, 
Item_1.ParentItem AS ChildParentItem1, 
Item_1.UnitOfMeasure AS ChildUOM1, 
Item_2.ID AS ChildID2, 
Item_2.ItemLookupCode AS ChildItemLookupCode2,
Item_2.ParentItem AS ChildParentItem2, 
Item_2.UnitOfMeasure AS ChildUOM2, 
Item_3.ID AS ChildID3, 
Item_3.ItemLookupCode AS ChildItemLookupCode3,
Item_3.ParentItem AS ChildParentItem3, 
Item_3.UnitOfMeasure AS ChildUOM3
FROM (
    SELECT I0.ID as ID0, I1.ID as ID1, I2.ID as ID2, I3.ID as ID3
    FROM #tmpItem AS I0
    LEFT JOIN #tmpItem AS I1 ON (I0.ID  = I1.ParentID)
    LEFT JOIN #tmpItem AS I2 ON (I1.ID  = I2.ParentID)
    LEFT JOIN #tmpItem AS I3 ON (I2.ID  = I3.ParentID)
) Q
LEFT JOIN dbo.Item Item_0 ON Q.ID0 = Item_0.ID
LEFT JOIN dbo.Item Item_1 ON Q.ID1 = Item_1.ID
LEFT JOIN dbo.Item Item_2 ON Q.ID2 = Item_2.ID
LEFT JOIN dbo.Item Item_3 ON Q.ID3 = Item_3.ID;

Below is just an experiment in using a recursive query.
Making use of the index on ID.
Yes I know, it doesn't return parents without children. Please don't judge.

declare @Item table (ID int primary key, ItemLookupCode varchar(11), UnitOfMeasure varchar(3), ParentItem int);
insert into @Item (ID, ItemLookupCode, UnitOfMeasure, ParentItem) values 
(111,'100006C0005','CRT',0), (112,'100006B0001','BAG',111), 
(221,'100027C0002','CRT',0), (222,'100027T0012','PCT',221), (223,'100027P0001','PC',222), 
(224,'100027X0001','XX',223),(225,'100027Y0001','YY',223),
(226,'100027Z0001','ZZ',225);

WITH RCTE AS
(
   select ID as StartID, 0 as PrevID, 0 as Level, ID, ParentItem as ParentID, ItemLookupCode, UnitOfMeasure
   from @Item

   union all

   select RCTE.StartID, RCTE.ID, RCTE.Level + 1, t.ID, t.ParentItem, t.ItemLookupCode, t.UnitOfMeasure
   from RCTE
   join @Item t on (RCTE.ParentID = t.ID)
)
select 
max(case when ReverseLeveL = 0 then ID end) as ID0,
max(case when ReverseLeveL = 0 then ItemLookupCode end) as ItemLookupCode0,
max(case when ReverseLeveL = 0 then UnitOfMeasure end) as UnitOfMeasure0,
max(case when ReverseLeveL = 1 then ID end) as ID1,
max(case when ReverseLeveL = 1 then ItemLookupCode end) as ItemLookupCode1,
max(case when ReverseLeveL = 1 then UnitOfMeasure end) as UnitOfMeasure1,
max(case when ReverseLeveL = 2 then ID end) as ID2,
max(case when ReverseLeveL = 2 then ItemLookupCode end) as ItemLookupCode2,
max(case when ReverseLeveL = 2 then UnitOfMeasure end) as UnitOfMeasure2,
max(case when ReverseLeveL = 3 then ID end) as ID3,
max(case when ReverseLeveL = 3 then ItemLookupCode end) as ItemLookupCode3,
max(case when ReverseLeveL = 3 then UnitOfMeasure end) as UnitOfMeasure3
from (
   SELECT *, row_number() over (partition by StartID order by Level desc)-1 as ReverseLeveL
   from RCTE
   where Level <= 3
    ) Q
group by StartID
having max(case when ReverseLeveL = 1 then ID end) is not null;
LukStorms
  • 28,916
  • 5
  • 31
  • 45
0

The table design you have - Parent ID column- is called an Adjacency List. It is hierarchical data. The adjacency list is one of several ways to represent hierarchical data in SQL.

For your question...

Is there any other better solution for performance?

Is your hierarchy only three levels deep? Great! Then a 3X SELF JOIN will be fine. It is the most straight-forward and best performing approach. Add any missing indices from the SQL Execution plan stick with this approach.


Does your hierarchy have arbitrary depth? For that, recursion is needed. You now have several options. A recursive CTE is the simplest approach and I would suggest you use that unless the performance is too poor. If it is, then Simple-Talk does a great job here comparing the performance of the various methods available: Recusive CTEs, Dynamic SQL, While Loops, etc...

enter image description here

It's a bit of a deep dive, but they do eventually settle on an exotic performance champion.

When traversing the hierarchy we constructed between eight and thirteen levels deep, the set-based WHILE loop avoiding Halloween protection with RECOMPILE option (LAHPwR) is most often the elapsed time winner. Perhaps surprisingly, the non-traditional recursive function seems to do just a little better than the rCTE between six and twelve levels. It is exactly tied with the set-based loop at seven levels.

So yes, there are some complex ways to query arbitrary depth adjacency lists. Some of them better performing than others.

Instead of exotic query methods, a simpler approach could be just keeping a copy of the data in different hierarchical format. One that can be queried much faster. A Nested Set has been shown to be orders of magnitudes faster to query than an Adjacency List. There is an SP at the bottom of this article which shows how to convert an Adjacency List to a Nested Set.

Good luck!

Community
  • 1
  • 1
Troy Witthoeft
  • 2,498
  • 2
  • 28
  • 37