0

This query was working perfectly fine and fast to complete when I almost have 500 rows in Member_Contact_Edges table. But now, I have nearly 1.000 rows in this table and this query takes 20-30 seconds to complete. I couldn't figure out where the problem is. I tried Clustered and Non-Clustered index. I tried every combination of indexes but no success.

 ;WITH transitive_closure(member_a, member_b, distance, path_string) AS

  (SELECT member_a, member_b, 1 AS distance, CAST(member_a as varchar(MAX)) + '.' + CAST(member_b as varchar(MAX)) + '.' AS path_string
          FROM Member_Contact_Edges
          WHERE member_a = @source AND contact_durum=1 -- source

   UNION ALL

   SELECT tc.member_a, e.member_b, tc.distance + 1, CAST(tc.path_string as varchar(MAX)) + CAST(e.member_b as varchar(MAX)) + '.' AS path_string
          FROM Member_Contact_Edges AS e
          JOIN transitive_closure AS tc ON e.member_a = tc.member_b
          WHERE tc.path_string NOT LIKE '%' + CAST(e.member_b as varchar(MAX)) + '.%' AND e.contact_durum=1
   )

   SELECT distance, path_string FROM transitive_closure
          WHERE member_b=@target AND distance <= 3 -- destination
          ORDER BY member_a, member_b, distance;

This is how I call Stored Procedure:

   Exec Contacts_KacinciDerece @source = 30284, @target=24688

The output: (It's what I expected and this query creates this)

Expected output but takes so much time to create

Thanks.

Burak F. Kilicaslan
  • 535
  • 2
  • 8
  • 20
  • In a recursive CTE (which is what you have here), doubling the number of records in the source tables can lead to exponential processing time increases since it's evaluating against itself. Doubling the number of rows means it is comparing 3x the number of rows (500*500 vs 1000*1000) – JNK Jun 28 '11 at 15:39
  • Correction - 4x the number of rows, 250k vs 1m – JNK Jun 28 '11 at 15:45

2 Answers2

3

You have a path_string NOT LIKE '%' + CAST(e.member_b as varchar(MAX)) + '.%'

  • There is no way to optimise either a NOT LIKE or leading wildcard LIKE, let alone both
  • path_string is calculated field anyway

Every extra row in Member_Contact_Edges multiplies the number of rows that must be scanned (the square of) without any benefit of indexes.

This O(n^2) at least: I suspect higher...

gbn
  • 422,506
  • 82
  • 585
  • 676
  • 3
    +1 - Few things show issues in optimization as quickly as a recursive CTE - when your bad code is built off other bad code it slows down quick. – JNK Jun 28 '11 at 15:55
  • That's right, any suggestion how to get an output like this? I opened a thread for this issue but I could't find any other solution except recursive CTE [Previous post](http://stackoverflow.com/questions/5768904/sql-server-how-to-select-first-second-and-third-degree-contacts) - [Source](http://techportal.ibuildings.com/2009/09/07/graphs-in-the-database-sql-meets-social-networks/) – Burak F. Kilicaslan Jun 28 '11 at 16:23
  • 1
    Compare e.member_b to tc.member_b instead of tc.path_string. – Phil Helmer Jun 28 '11 at 16:37
  • @Phil Helmer So you think the main performance issue here is NOT LIKE statement? Am I right? Let me try it. – Burak F. Kilicaslan Jun 28 '11 at 16:42
  • @Phil Helmer I couldn't manage to run edited query. 'JOIN transitive_closure AS tc ON e.member_a = tc.member_b AND e.member_b<>tc.member_b WHERE e.contact_durum=1' Error Desc: The statement terminated. The maximum recursion 100 has been exhausted before statement completion. – Burak F. Kilicaslan Jun 28 '11 at 17:23
  • @Phil Helmer, @JNK I edited the question as an answer. Please check. – Burak F. Kilicaslan Jun 28 '11 at 19:42
0

Looks like NOT LIKE statement is the key for performance. I managed to run the query in 160ms by limiting the distance.

But a problem occurs here:

If I don't use NOT LIKE statement, it selects the same person twice or more because of the recursive selection.

For example;

;WITH transitive_closure(member_a, member_b, distance, path_string) AS

(SELECT member_a, member_b, 1 AS distance, CAST(member_a as varchar(MAX)) + '.' + CAST(member_b as varchar(MAX)) + '.' AS path_string
      FROM Member_Contact_Edges
      WHERE member_a = @source AND contact_durum=1 -- source

UNION ALL

SELECT tc.member_a, e.member_b, tc.distance + 1, CAST(tc.path_string as varchar(MAX)) + CAST(e.member_b as varchar(MAX)) + '.' AS path_string
      FROM Member_Contact_Edges AS e
      JOIN transitive_closure AS tc ON e.member_a = tc.member_b
      WHERE tc.member_b <> e.member_b AND tc.distance<4 AND e.contact_durum=1
)

SELECT distance, path_string FROM transitive_closure
      WHERE member_b=@target AND distance < 4 -- destination
      ORDER BY member_a, member_b, distance;
Burak F. Kilicaslan
  • 535
  • 2
  • 8
  • 20