2

I'm trying to calculate a time difference between 2 rows and applied the solution from this SO question. However I get an exception:

> org.apache.hive.service.cli.HiveSQLException: Error while compiling
> statement: FAILED: SemanticException Failed to breakup Windowing
> invocations into Groups. At least 1 group must only depend on input
> columns. Also check for circular dependencies. Underlying error:
> Expecting left window frame boundary for function
> LAG((tok_table_or_col time), 1, 0) Window
> Spec=[PartitioningSpec=[partitionColumns=[(tok_table_or_col
> client_id)]orderColumns=[(tok_table_or_col time) ASC
> NULLS_FIRST]]window(type=ROWS, start=1 PRECEDING, end=currentRow)] as
> LAG_window_0 to be unbounded. Found : 1

HiveQL:

SELECT id, loc, LAG(time, 1, 0) OVER (PARTITION BY id, loc ORDER BY time ROWS 1 PRECEDING) - time AS response_time FROM mytable

How to I fix this? What is the issue?

EDIT:

Sample data:

id  loc time
0   1   1414250523591
0   1   1414250523655
1   2   1414250523655
1   2   1414250523661
1   3   1414250523661
1   3   1414250523662

And what I want is the difference of time between rows with same id and loc (always pairs of 2).

EDIT2: I should also mention I'm new to hadoop/hive ecosystem.

So as the error said, the window should be unbounded. So I just removed the ROWS clause and now at least it is doing something but it still is wrong. So I just wanted to check what the LAG value actually is:

SELECT id, loc, LAG(time, 1) OVER (PARTITION BY id, loc ORDER BY time) AS lag_col FROM mytable

And I get this as output:

id  loc lag_col
1   2   null
1   2   -1
1   3   null
1   3   -1

The null is clear because I removed the default value but why -1? Are the large values in time column leading to somekind of overflow? Column is defined as bigint so it should actually fit without problem but maybe there is a conversion to int during the query?

beginner_
  • 7,230
  • 18
  • 70
  • 127

0 Answers0