0

I am using AWS Athena to query my data and I need to do a Bitwise Left Shift operation. How can I implement this in Athena?

More context:

I am trying to keep track of events using bitwise operations (shifting bits to the left where 1 means my event happened and 0 means it didn't).

So the chain 11101 would be read from right to left indicating if the event happened as follows: "Yes, no, yes ,yes, yes', where the biggest chain I can build is equal to the MAX size of BIGINT, so I can track at most 63 days

I tried using the operator << as I would in a Hadoop/Hive architecture but it's not available in Athena.

I saw that there's someone requesting that change, but it's not implemented.

https://github.com/prestodb/presto/issues/4028

I was reading more about this topic here:

What are bitwise shift (bit-shift) operators and how do they work?

So I tried implementing my own version of the Non-Circular Bitwise Left Shift but I don't know if it covers all the edge cases.

This is the Hive code I want to "translate" to Athena:

SELECT my_num<<1 as bit_shifted_num 
FROM my_table

This is my code:

SELECT if(CAST(my_num AS BIGINT) = 9223372036854775807 OR CAST(my_num AS BIGINT)*pow(2,1) > 9223372036854775807,from_base(rpad(substr(to_base(CAST(my_num AS BIGINT),2),2,length(to_base(CAST(my_num AS BIGINT),2))-1), length(to_base(CAST(my_num AS BIGINT),2)), '1'),2),CAST(CAST(my_num AS BIGINT)*pow(2,1) AS BIGINT)) AS bit_shifted_num 
FROM my_table

My logic breakdown is:

  1. If my number is equal than 9223372036854775807 OR my number multiplied by 2 (left shift 1 bit) is greater than 9223372036854775807 (as MAX BIGINT * 2 results in the NUMBER data type which is bigger than MAX BIGINT), convert the previous number into it's STRING BIT representation using the function to_base, substring this STRING removing the first digit (reading from left to right) and inserting a 1 at the end of the string, then convert back to it's BIG INT representation using the function from_base.
  2. If the number isn't bigger than MAX BIGINT (9223372036854775807), then just multiply it by 2, which is the equivalent of a left shift of 1 bit.

Am I missing any cases handled by HIVE's << or what implementation do I need so my function deals with the left shift properly in all cases?

Diego Serrano
  • 846
  • 2
  • 15
  • 34
  • 2
    There's currently no easy way to do bitwise shifts in Presto, as you have observed. I've opened a new issue to track this improvement: https://github.com/prestosql/presto/issues/722 – Martin Traverso May 06 '19 at 22:36

1 Answers1

0

Implementing this function through SQL is way more complicated than what I proposed as Hive adds a bit for negative values once the MAX BIG INT value is reached (plus some casting behaviour in Presto make values smaller than MAX BIG INT equal to MAX BIG INT).

My solution was using EMR with the AWS catalogue and running queries on the same tables using Hive SQL, which has an implementation of the bitwise left shift using shift_left.

Diego Serrano
  • 846
  • 2
  • 15
  • 34