3

I am looking for a SQL query that computes the average time to comment (measured for every month).

I was able to write a query that measures the average time between the original post datetime and the comment datetime but still this is not correct as the time should be measured between the current comment and the previous one, as they are related most of the time.


select 
  dateadd(month, datediff(month, 0, Comments.creationdate),0) [Date],
  AVG(CAST(DATEDIFF(hour, Posts.CreationDate, Comments.creationdate ) AS BigInt)) [DelayHours]
  from comments
    INNER JOIN posts ON Comments.PostId = Posts.Id
  GROUP BY
    dateadd(month, datediff(month, 0, Comments.creationdate),0)
  ORDER BY Date
sorin
  • 161,544
  • 178
  • 535
  • 806
  • 2
    What version of SQLServer? If it's sqlserver 2012, you can use LAG (http://msdn.microsoft.com/en-us/library/hh231256.aspx)... – a1ex07 Jan 19 '13 at 15:11
  • I have no idea what version of SQL server it is, but I am running the query from http://data.stackexchange.com – sorin Jan 19 '13 at 15:26
  • 1
    It's `Microsoft SQL Server 2012 (SP1) - 11.0.3000.0 (X64)` (you can run `select @@version` in the Data Explorer interface) – Cristian Lupascu Jan 19 '13 at 15:28
  • Do you want to find the average time between original post datetime and the **first** comment datetime? – ypercubeᵀᴹ Jan 19 '13 at 15:54
  • Average time between each operation, meaning: for the first comment is comment date - post date, for the 2nd comment is time of 2nd comment minus the time of the 1st comment... – sorin Jan 19 '13 at 20:25

1 Answers1

1

I think something like this should work. Sorry, I cannot test it at the moment; I apologize in case I made a misprint.

WITH cte1 AS
(
SELECT c.PostId, c.creationdate, 
ROW_NUMBER() OVER (PARTITION BY c.PostId ORDER BY c.creationdate) AS rn
FROM comments c
)
SELECT dateadd(month, datediff(month, 0, a.creationdate),0) [Date],
AVG(diff_hr) AS avg_diff
FROM 
(
  SELECT a1.PostId, a1.creationdate,
  CASE
   WHEN a1.rn = 1 THEN 
    CAST(DATEDIFF(hour,p.creationdate,a1.creationdate) AS BIGINT) 
   ELSE
    CAST(DATEDIFF(hour,a2.creationdate,a1.creationdate) AS BIGINT) 
  END AS diff_hr
  FROM cte1 a1
  INNER JOIN posts p ON (p.Id = a1.PostId)
  LEFT JOIN cte1 a2 ON (a2.PostId = a1.PostId AND a2.rn = a1.rn-1)
)a
GROUP BY dateadd(month, datediff(month, 0, a.creationdate),0)

Update For SQLServer 2012 LAG will simplify the solution... I noticed comment about version too late .

Update 2 Misprints fixed (missed FROM clause and p.PostId changed to p.Id to match table definition)

a1ex07
  • 36,826
  • 12
  • 90
  • 103
  • +1 for the `LAG` comment. I just learned something useful today. :) – Cristian Lupascu Jan 19 '13 at 15:38
  • 1
    Regarding the query above - I think you're missing a `FROM` clause in the CTE definition. Try running the query in this interface: http://data.stackexchange.com/stackoverflow/query/new – Cristian Lupascu Jan 19 '13 at 15:41
  • I just couldn't resist and made a version using `LAG`: http://data.stackexchange.com/stackoverflow/query/edit/92805 – Cristian Lupascu Jan 19 '13 at 15:57
  • the negative values in the dataset are due to the fact that some *edited* posts have a CreationDate long after some of the comments (for example, see the first comment here: http://stackoverflow.com/a/93952/390819) – Cristian Lupascu Jan 19 '13 at 15:59
  • @w0lf: Yeah, using `lag` makes the query way simpler... By the time I realized you are using 2012 I had already posted "non-lag" version which seems to be working as well. – a1ex07 Jan 19 '13 at 16:03