1

I'm novice to SQL (in hive) and trying to calculate every anonymousid's time spent between first event and last event in minutes. The resource table's timestamp is formatted as string, like: "2020-12-24T09:47:17.775Z". I've tried in two ways:

1- Cast column timestamp to bigint and calculated the difference from main table.

select anonymousid, max(from_unixtime(cast('timestamp' as bigint)) - min(from_unixtime(cast('timestamp' as bigint)) from db1.formevent group by anonymousid

I got NULLs after implementing this as a solution.

2- Create a new table from main resource, put conditions to call with 'where' and tried to convert 'timestamp' to date format without any min-max calculation.

create table db1.successtime as select anonymousid, pagepath,buttontype, itemname, 'location', cast(to_date(from_unixtime(unix_timestamp('timestamp', "yyyy-MM-dd'T'HH:mm:ss.SSS"),'HH:mm:ss') as date) from db1.formevent where pagepath = "/account/sign-up/" and itemname = "Success" and 'location' = "Standard"

Then I got NULLs again and I left. It looks like this

Is there any way I can reformat and calculate time difference in minutes between first and last event ('timestamp') and take the average grouped by 'location'?

leftjoin
  • 36,950
  • 8
  • 57
  • 116
Ipek
  • 23
  • 5

2 Answers2

1
select anonymousid,
       (max(unix_timestamp(timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")) - 
        min(unix_timestamp(timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")) 
       ) / 60
from db1.formevent
group by anonymousid;
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Thank you. I got zero's as an output. I think It's calculating the hourly difference, should I change the divisor? @leftjoin – Ipek Feb 02 '21 at 12:09
  • @Ipek Check this example: `select unix_timestamp('2020-12-24T09:47:17.775Z', "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")- unix_timestamp('2020-12-24T09:47:00.775Z', "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")` Result=17 unix_timestamp gives seconds (bigint). milliseconds are lost if you are converting like this – leftjoin Feb 02 '21 at 12:16
  • @Ipek Read also this post about milliseconds: https://stackoverflow.com/a/58713989/2700344 – leftjoin Feb 02 '21 at 12:24
  • @Ipek @Ipek Milliseconds precision (seconds.milliseconds): `select cast(timestamp(regexp_replace("2019-11-02T20:18:12.123Z", '^(.+?)T(.+?)Z$','$1 $2')) as double)- cast(timestamp(regexp_replace("2019-11-02T20:18:11.000Z", '^(.+?)T(.+?)Z$','$1 $2')) as double) ` – leftjoin Feb 02 '21 at 12:34
  • @Ipek But better to extract and process milliseconds part separately as int to eliminate issues with double precision – leftjoin Feb 02 '21 at 12:37
0

From your description, this should work:

select anonymousid,
       (max(unix_timestamp(timestamp, 'yyyy-MM-dd'T'HH:mm:ss.SSS'),'HH:mm:ss') - 
        min(unix_timestamp(timestamp, 'yyyy-MM-dd'T'HH:mm:ss.SSS'),'HH:mm:ss') 
       ) / 60
from db1.formevent
group by anonymousid;

Note that the column name is not in single quotes.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786