I'm having a little problem getting SQL Server returning the same results as Oracle for our SQL tree query.
We have a link table with attributes
CHILD_ID, LINK_ID, PARENT_ID
and a element table with ELEMENT_ID
In Oracle we use this
SELECT * FROM LINKS
JOIN ELEMENTS ON ELEMENT_ID = CHILD_ID
WHERE LINK_ID IN
(SELECT LINK_ID FROM LINKS CONNECT BY PRIOR CHILD_ID = PARENT_ID START WITH PARENT_ID = 'startid')
In SQL Server we use this
WITH TREE_LINKS AS
(SELECT CHILD_ID, LINK_ID FROM LINKS WHERE PARENT_ID = 'startid'
UNION ALL
SELECT CURRENT_LINKS.CHILD_ID, CURRENT_LINKS.LINK_ID
FROM LINKS CURRENT_LINKS
INNER JOIN TREE_LINKS t1 ON CURRENT_LINKS.PARENT_ID = t1.CHILD_ID)
SELECT * FROM TREE_LINKS
INNER JOIN LINKS ON TREE_LINKS.LINK_ID = LINKS.LINK_ID
INNER JOIN ELEMENTS ON ELEMENTS.ELEMENT_ID = TREE_LINKS.CHILD_ID
This works perfectly fine apart from 1 issue.
In Oracle we only get each unique link, based on LINK_ID
. In SQL Server we get all the links describing the full tree which can include duplicates when 1 element exists below multiple other elements in different branches of the structure.
This means we get a large amounts of duplicate rows from SQL Server. I tested adding
SELECT DISTINCT TREE_LINKS.LINK_ID AS TREE_LINK_ID, * from TREE_LINKS
in the last select statement but it just takes as long as the server has more work to remove the duplicates if found in different branches.
In 1 test case at the moment we have Oracle returning 20,000 rows and SQL Server returning 1.6 million rows. So far I have found no way to get SQL Server returning the same results as fast.
FYI : Adding DISTINCT in the recursion causes
DISTINCT operator is not allowed in the recursive part of a recursive common table expression 'TREE_LINKS'.
Edit: - An example
If we have links like this
PARENT_ID, LINK_ID, CHILD_ID
1 1 2
2 2 3
3 3 4
1 4 3
There are 4 unique elements, but there are 6 elements in the complete tree. This is because there are 2 paths to element 3