I have a dataset that looks like the following:
For each OwnerID
, I'd like to calculate the difference in column creationtime
for the current record and the next record (for the same ownerID
), in the form of a new column TimeDiff
. I believe a self join would be required here, but I'm not sure how to use the self join to calculate the difference between the current record and the next record.
While doing this, the very last record for any ownerID
can have a default value of 'NA' as there won't be a next record (for the same ownerID
) to calculate the difference from.
Here's the query that I used to get this dataset:
SELECT DISTINCT ga.ownerid,
mr.name,
SPLIT_PART(SPLIT_PART(ga.activitydata,' ',2),',',1) AS Assignmentid,
EXTRACT(YEAR FROM ga.creationtime) AS YEAR,
EXTRACT(MONTH FROM ga.creationtime) AS MONTH,
EXTRACT(DAY FROM ga.creationtime) AS DAY,
EXTRACT(DOW FROM ga.creationtime) AS DOW,
ga.creationtime,
a.encodedid,
a.name
FROM flx2.groupactivities ga
JOIN flx2.memberstudytrackitemstatus mstis ON SPLIT_PART (SPLIT_PART (ga.activitydata,' ',2),',',1) = mstis.assignmentid
JOIN flx2.artifacts a ON mstis.studytrackitemid = a.id
JOIN auth.memberhasroles mhr ON mhr.memberid = ga.ownerid
JOIN flx2.memberroles mr ON mr.id = mhr.roleid
WHERE ga.activitytype = 'assign'
AND ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
AND a.artifacttypeid = 54
AND a.encodedid IS NOT NULL
ORDER BY ga.ownerid,
ga.creationtime,
a.encodedid
I'm using Amazon Redshift
to get this data.
Any help would be appreciated.
TIA!
UPDATE:
I used the method suggested by @systemjack . Here are the results that I get:
We can clearly notice here that the column encodedid
is getting repeated for the same assignmentID
(MAT.PRB.410
, as highlighted in the image above), which shouldn't be the case. This wasn't happening without the LEAD
function, in the query mentioned above. Here's the updated query that I am using (only has an extra LEAD
function):
SELECT DISTINCT ga.ownerid,
mr.name,
SPLIT_PART(SPLIT_PART(ga.activitydata,' ',2),',',1) AS Assignmentid,
EXTRACT(YEAR FROM ga.creationtime) AS YEAR,
EXTRACT(MONTH FROM ga.creationtime) AS MONTH,
EXTRACT(DAY FROM ga.creationtime) AS DAY,
EXTRACT(DOW FROM ga.creationtime) AS DOW,
ga.creationtime,
LEAD(ga.creationtime,1) OVER (PARTITION BY ga.ownerid ORDER BY ga.creationtime) AS nexttime,
a.encodedid,
a.name
FROM flx2.groupactivities ga
JOIN flx2.memberstudytrackitemstatus mstis ON SPLIT_PART (SPLIT_PART (ga.activitydata,' ',2),',',1) = mstis.assignmentid
JOIN flx2.artifacts a ON mstis.studytrackitemid = a.id
JOIN auth.memberhasroles mhr ON mhr.memberid = ga.ownerid
JOIN flx2.memberroles mr ON mr.id = mhr.roleid
WHERE ga.activitytype = 'assign'
AND ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
AND a.artifacttypeid = 54
AND a.encodedid IS NOT NULL
ORDER BY ga.ownerid,
ga.creationtime,
a.encodedid LIMIT 1000
The values in the nexttime
column also seem to be jacked up. It seems to be taking the next value in the creationtime
column on ocassion. For example: In the 2nd record, the value of nexttime
column should've been 2013-09-18 06:14:59
instead of 2014-01-18 12:16:49
Why are we getting more records than expected? How do I fix these problems?