-1

I am trying to calculate the time difference between events (in seconds) on a kiosk machine. Each time a person begins using the screen it creates a VisitID. Within that visit, a series of computer events are created that determine the beginning of the screen and the ending of the screen I have identified each event for each screen. Depending on what choices the user makes, they may or may not go through each event or screen. My data looks like this:

VisitID       ScreenTime           EventName
1              13:24:08         WelcomeScreen_Begin
1              13:24:15         Welcome_End
1              13:24:15         NameScreenInit_Start
1              13:24:32         Name_Screen_Terminate
1              13:24:33         CompanyScreen_Enter
1              13:24:45         Company_Lookup
1              13:24:45         Signoff_Now
1              13:24:52         SignOutEnd
2              13:45:02         WelcomeScreen_Begin
2              13:45:15         Welcome_End
2              13:45:15         NameScreenInit_Start
2              13:45:40         Name_Screen_Terminate
2              13:45:40         Signoff_Now
2              13:45:58         SignOutEnd

I would like for my results to look like this:

VisitID       WelcomeScreen       NameScreen      CompanyScreen    SignoffScreen
1               00:00:7           00:00:17           00:00:12          00:00:07
2               00:00:13          00:00:25              NULL           00:00:18

Ultimately, I would like to calculate the average time spent on EACH screen:

WelcomeScreenAVG      NameScreenAVG      CompanyScreenAVG      SignoffScreenAVG
00:00:10              00:00:21           00:00:12              00:00:13

I have been able to use subscripts to grab each timestamp within a DATEDIFF clause, but it ONLY works if I include the VisitID in the WHERE clause. I have hundreds of thousands of entries so that isn't the way to do it. I've been racking my brain on this for days. Does anyone have a solution, or at least know what I should be looking to do instead? Thanks in advance!

There are a few occurrences when the only event available to signify the beginning of one screen is the same event that ends the prior screen. For Example:

VisitID       ScreenTime       EventName
34            08:34:36         Delivery_summaryConfirm
34            08:34:47         Seal_questionsBegin
34            08:35:17         EndSeal_Inquiry

^^This is 2 Screens^^ In the example above, While Seal_questionsBegin is the Beginning of the SealInquiry Screen, it is the only event able to signify the end of the DeliverySummary Screen. This is because the DeliverySummary Screen is always followed by a SealInquiry Screen.

  • The event are a stable few or a long list of variable strings? In the second case you may need a dynamic pivot – Serpiton May 13 '14 at 14:34

3 Answers3

0

Split your dataset into "begins" and "ends", then do a simple join and a group by:

SELECT 
    Begins.ScreenName,
    AVG(DateDiff(second, Begins.ScreenTime, Ends.ScreenTime)) AS TimeSpentSeconds
FROM (
    SELECT 
        VisitID, 
        LEFT(EventName, LEN(EventName)-5) AS ScreenName, 
        ScreenTime 
    FROM MyTable WHERE EventName LIKE '%Begin'
) AS Begins
INNER JOIN (
    SELECT 
        VisitID, 
        LEFT(EventName, LEN(EventName)-3) AS ScreenName, 
        ScreenTime 
    FROM MyTable WHERE EventName LIKE '%End'
) AS Ends
ON Begins.VisitID = Ends.VisitID AND Begins.ScreenName = Ends.ScreenName
GROUP BY Begins.ScreenName

This returns the data in a row-wise fashion:

ScreenName      TimeSpentSeconds
----------      ----------------
WelcomeScreen   10
NameScreen      21
CompanyScreen   12
SignoffScreen   13

Use PIVOT to change this into a column-wise layout. To change the seconds into hours-minutes-seconds, see for example this post.

Community
  • 1
  • 1
Dan
  • 10,480
  • 23
  • 49
  • getting an Ambiguous column name 'ScreenName' error also this isnt returning data as OP requests – sarin May 13 '14 at 14:10
  • My bad. Change `SELECT ScreenName ...` to `SELECT Begins.ScreenName ...`, and likewise in the `GROUP BY` part of the query. I've updated my answer. – Dan May 13 '14 at 14:13
  • Thank you so much! In my data, it isn't exactly like the example I gave. The 's arent as simple as EventBegin and EventEnd. They are actually quite random. I'm sorry for making it look like that. What should I do instead of using the LEN(EventName)-5 and LEN(EventName)-3? For Example a few of the names of my events are "PG_IG_DriverID_HostInquiry" and "PG_IG_DriverID_HostDriverInfo". These events are the beginning and end of one screen.

    **I have identified the two events that make up each screen**

    – Beauner247 May 13 '14 at 14:31
  • The query above filters only those events whose names end with "End" or "Begin". Do you need to take other names into account? In that case, I think your best option would be to create a temporary table that specifies how eventnames should be grouped, for example the table could hold one record with "TestEventABCStart", another with "TestEventXYZStop". Both would then have a column with the value "TestEvent" for grouping. That way, you could join with this temporary table and then use the grouping column name in the query above. – Dan May 13 '14 at 14:36
0

UPDATE

If you have a fixed event list for each screen you can do the below. I think it looks easier to read and is simpler than other posts...but that's me.

New Sql Fiddle

With tblUserTimes as (
  Select VisitId, 

  DateDiff(second, 
  Max(Case When EventName = 'WelcomeScreen_Begin' Then ScreenTime End),
  Max(Case When EventName = 'Welcome_End' Then ScreenTime End)) as WelcomeScreen,

  DateDiff(second, 
  Max(Case When EventName = 'NameScreenInit_Start' Then ScreenTime End),
  Max(Case When EventName = 'Name_Screen_Terminate' Then ScreenTime End)) as NameScreen,

  DateDiff(second, 
  Max(Case When EventName = 'CompanyScreen_Enter' Then ScreenTime End),
  Max(Case When EventName = 'Company_Lookup' Then ScreenTime End)) as CompanyScreen,

  DateDiff(second, 
  Max(Case When EventName = 'Signoff_Now' Then ScreenTime End),
  Max(Case When EventName = 'SignOutEnd' Then ScreenTime End)) as SignoffScreen

  from Visits
  Group By VisitId)

Select AVG(WelcomeScreen) as WelcomeScreenAvg,
       AVG(NameScreen) as NameScreenAvg,
       AVG(CompanyScreen) as CompanyScreenAvg,
       AVG(SignoffScreen) as SignoffScreenAvg
from tblUserTimes

See this SQL Fiddle for example with pivot.

sarin
  • 5,227
  • 3
  • 34
  • 63
  • I have edited my original Question to represent my data more accurately. The Events are not so simply named. They are quite random. For Example a few of the names of my events are "PG_IG_DriverID_HostInquiry" and "PG_IG_DriverID_HostDriverInfo". These events are the beginning and end of one screen. I've identified all the events to each screen. – Beauner247 May 13 '14 at 14:48
0

The first think you need is to normalize the event name, the query take care of that, if can you should normalize the data or at least create the a translate table in you DB

Edit
The events are more than what I thought initially, the translate should be created as a full fledged table.

We can take advantage of the Translate table, adding the IsBegin and IsEnd bit fields and mapping the NormName field to the screen name, in addition to the dictionary, result in a simplified logic in the Times CTE.

CREATE TABLE NormalizeEvents (
  RealName VARCHAR(50)
, NormName VARCHAR(50)
, IsBegin BIT
, IsEnd BIT
)

INSERT INTO NormalizeEvents (RealName, NormName, IsBegin, IsEnd)
VALUES ('WelcomeScreen_Begin', 'WelcomeScreen', 1, 0)
     , ('Welcome_End', 'WelcomeScreen', 0, 1)
     , ('NameScreenInit_Start', 'NameScreen', 1, 0)
     , ('Name_Screen_Terminate', 'NameScreen', 0, 1)
     , ('CompanyScreen_Enter', 'CompanyScreen', 1, 0)
     , ('Company_Lookup', 'CompanyScreen', 0, 1)
     , ('Signoff_Now', 'SignoffScreen', 1, 0)
     , ('SignOutEnd', 'SignoffScreen', 0, 1)
     , ('Delivery_summaryConfirm', 'DeliverySummary', 1, 0)
     , ('Seal_questionsBegin', 'DeliverySummary', 0, 1)
     , ('Seal_questionsBegin', 'SealQuestions', 1, 0)
     , ('EndSeal_Inquiry', 'SealQuestions', 0, 1)

In case of an event name that is both the end of an event and the begin of a new one it should be inserted twice, as 'Seal_questionsBegin' in the example

WITH Times AS (
  SELECT VisitID, EventName = t.NormName
       , ScreenTimeBegin = Max(CASE t.IsBegin
                                    WHEN 1 THEN ScreenTime
                                    ELSE NULL
                               END)
       , ScreenTimeEnd = Max(CASE t.IsEnd
                                  WHEN 1 THEN ScreenTime
                                  ELSE NULL
                             END)
  FROM   Events e
         INNER JOIN NormalizeEvents t ON e.EventName = t.RealName
  GROUP BY VisitID, NormName
)
SELECT VisitID
     , [WelcomeScreen] = CAST(DATEADD(ss, WelcomeScreen, '1900-01-01') as Time)
     , [CompanyScreen] = CAST(DATEADD(ss, CompanyScreen, '1900-01-01') as Time)
     , [NameScreen] = CAST(DATEADD(ss, NameScreen, '1900-01-01') as Time)
     , [SignoffScreen] = CAST(DATEADD(ss, SignoffScreen, '1900-01-01') as Time)
     , [DeliverySummary]=CAST(DATEADD(ss, DeliverySummary, '1900-01-01') as Time)
     , [SealQuestions] = CAST(DATEADD(ss, SealQuestions, '1900-01-01') as Time)
FROM   (SELECT VisitID, EventName
             , Duration = DateDiff(ss, ScreenTimeBegin, ScreenTimeEnd)
        FROM   Times) SOURCE
       PIVOT (AVG(Duration) FOR EventName IN
              ([WelcomeScreen], [CompanyScreen], [NameScreen], [SignoffScreen]
             , [DeliverySummary], [SealQuestions])
             ) pvt

SQLFiddle demo.

To get the global average per screen just remove the VisitID field in the Source of the PIVOT and the main query.

WITH Times AS (
  SELECT VisitID, EventName = t.NormName
       , ScreenTimeBegin = Max(CASE t.IsBegin
                                    WHEN 1 THEN ScreenTime
                                    ELSE NULL
                               END)
       , ScreenTimeEnd = Max(CASE t.IsEnd
                                  WHEN 1 THEN ScreenTime
                                  ELSE NULL
                             END)
  FROM   Events e
         INNER JOIN NormalizeEvents t ON e.EventName = t.RealName
  GROUP BY VisitID, NormName
)
SELECT [WelcomeScreen] = CAST(DATEADD(ss, WelcomeScreen, '1900-01-01') as Time)
     , [CompanyScreen] = CAST(DATEADD(ss, CompanyScreen, '1900-01-01') as Time)
     , [NameScreen] = CAST(DATEADD(ss, NameScreen, '1900-01-01') as Time)
     , [SignoffScreen] = CAST(DATEADD(ss, SignoffScreen, '1900-01-01') as Time)
     , [DeliverySummary]=CAST(DATEADD(ss, DeliverySummary, '1900-01-01') as Time)
     , [SealQuestions] = CAST(DATEADD(ss, SealQuestions, '1900-01-01') as Time)
FROM   (SELECT EventName
             , Duration = DateDiff(ss, ScreenTimeBegin, ScreenTimeEnd)
        FROM   Times) SOURCE
       PIVOT (AVG(Duration) FOR EventName IN
              ([WelcomeScreen], [CompanyScreen], [NameScreen], [SignoffScreen]
             , [DeliverySummary], [SealQuestions])
             ) pvt

SQLFiddle demo (out of date for the data).

As the version of SQLServer is not specified in the question I assume is SQLServer 2012 or better.

Serpiton
  • 3,676
  • 3
  • 24
  • 35
  • This looks like it will work. There are definitely times were screens are used for longer than 59 seconds. Can I use DATEDIFF somehow? – Beauner247 May 13 '14 at 15:17
  • Will this still work when the beginning of one screen shares the same event as the end of the prior screen? That situation does happen a few times in my data. – Beauner247 May 13 '14 at 18:46
  • I have edited the original question to explain the situation in detail. – Beauner247 May 13 '14 at 19:15
  • I'm receiving an error saying on the DATEADD statements saying: WelcomeScreen, CompanyScreen, NameScreen, SignoffScreen, DeliverySummary, SealQuestions are MSG 207 invalid Column names. – Beauner247 May 14 '14 at 18:34
  • Great thank you! It is returning results now. It finds the duration of each screen for each visit. So it is returning 5559 rows and the screen duration for each. How would I change it to give me one global AVG time spent for each screen? – Beauner247 May 14 '14 at 20:29
  • Works like a charm. Thank you so much! Not only does it work perfectly but I learned a lot in the process. Your generosity means a lot. – Beauner247 May 14 '14 at 21:00
  • If I wanted to weed out cases whose durations are too long (say longer than 5 min). I imagine it would be done inside the CASE statement in order to give the lengthy screens a NULL value. Am I correct? – Beauner247 May 15 '14 at 13:03