0

I am trying to fetch data from stackexchange database. My query is:

select distinct top 50 U.Id, U.DisplayName, U.Reputation,
       Tags = stuff( (SELECT ','+p2.Tags 
                       FROM posts p2 join votes V on p2.id = V.PostId
                       where V.VoteTypeId=5 and V.UserId = U.id
                       order by p2.CreationDate DESC
                       FOR XML PATH, TYPE).value('.[1]','nvarchar(max)')
                    ,1,1,'')
from Users U 
order by U.Reputation DESC;

However, when I run the query on data.stackexchange.com, it shows an error saying:Execution Timeout Expired. Is there any way that I can modify the query to optimize the execution time so that I can run this query successfully?

Leo Li
  • 73
  • 1
  • 3
  • 10
  • Post the execution plan here https://www.brentozar.com/pastetheplan/ and share it in the question. Also add the `index` details – Pரதீப் Nov 16 '17 at 04:50
  • I dont have the offline data from the database I dont know how to get the execution plan online through stackexchange database, could you let me know how to? @Pரதீப் – Leo Li Nov 16 '17 at 04:52

1 Answers1

0

You need exclude subqueries with individual XML. This will require building common XML with structure <user ...><tag1><tag2>...</user><user>... (using EXPLICIT MODE). And parse it to rows for individual user's.

with U as(
    select top 50 id, Reputation, DisplayName
      from users order by Reputation DESC
),
V as(
 select U.*,p2.Tags, p2.CreationDate
  from U, posts p2, votes V
 where p2.id = V.PostId and V.VoteTypeId=5 and V.UserId = U.id
),
Q(XML) as(
    select tag,parent,id [user!1!id],Reputation [user!1!Reputation],Name [user!1!Name], T [T!2!!element]
      from (
        select 1 tag, NULL parent, U.id, U.Reputation, U.DisplayName Name, NULL t, NULL dt
          from U
         union all
        select 2, 1, V.id, NULL, NULL, ','+Tags, CreationDate
          from V
     ) X
     order by id, tag, dt desc
       for xml explicit, type
)
select node.value('(@id)[1]','int'),
       node.value('(@Reputation)[1]','int'),
       node.value('(@Name)[1]','nvarchar(max)'),
       stuff(node.value('.[1]','nvarchar(max)'),1,1,'')
  from Q
 cross apply XML.nodes('/user') as Y(node)
 order by 2 desc
Mike
  • 1,985
  • 1
  • 8
  • 14
  • Thank you! It works! But when I try to retrieve `top 5000`, it can be case where expectation timeout. Could you help me retrieve the top 5000 rows? – Leo Li Nov 16 '17 at 20:40
  • @LeoLi For top 5000 users there are 500k voices, this is too much for unloading. – Mike Nov 16 '17 at 20:56