1

BigQuery and SQL noob here. I was going through possible data types big query supports here. I have a column in bigtable which is of type bytes and its original data type is scala Long. This was converted to bytes and stored in bigtable from my application code. I am trying to do CAST(itemId AS integer) (where itemId is the column name) in the BigQuery UI but the output of CAST(itemId AS integer) is 0 instead of actual value. I have no idea how to do this. If someone could point me in the right direction then I would greatly appreciate it.

EDIT: Adding more details
Sample itemId is 190007788462

Following is the code which writes itemId to the big table. I have included the relevant method. Using hbase client to write to bigtable.

import org.apache.hadoop.hbase.client._
def toPut(key: String, itemId: Long): Put = {
   val TrxColumnFamily = Bytes.toBytes("trx")
   val ItemIdColumn = Bytes.toBytes("itemId")
   new Put(Bytes.toBytes(key))
  .addColumn(TrxColumnFamily,
             ItemIdColumn,
             Bytes.toBytes(itemId))

}

Following is the entry in big table based on above code

ROW         COLUMN+CELL

foo         column=trx:itemId, value=\x00\x00\x00\xAFP]F\xAA

Following is the relevant code which reads the entry from big table in scala. This works correctly. Result is a org.apache.hadoop.hbase.client.Result

private def getItemId(row: Result): Long = {
  val key = Bytes.toString(row.getRow)
  val TrxColumnFamily = Bytes.toBytes("trx")
  val ItemIdColumn = Bytes.toBytes("itemId")

  val itemId =
    Bytes.toLong(row.getValue(TrxColumnFamily, ItemIdColumn))
  itemId

}

The getItemId function above correctly returns itemId. That's because Bytes.toLong is part of org.apache.hadoop.hbase.util.Bytes which correctly casts the Byte string to Long.

I am using big query UI similar to this one and using CAST(itemId AS integer) because BigQuery doesn't have a Long data type. This incorrectly casts the itemId byte string to integer and resulting value is 0.

Is there any way I can have a Bytes.toLong equivalent from hbase-client in BigQuery UI? If not is there any other way I can go about this issue?

Karan Ashar
  • 1,392
  • 1
  • 10
  • 23
  • Please give some examples of what `itemId` looks like and what the converted values should be. – Elliott Brossard Nov 09 '17 at 02:27
  • I'm confused about this value: `\x00\x00\x00\xAFP]F\xAA`. Is the assumption that `FP]F` is interpreted as bytes as well? The rest of the value is a sequence of hex escapes. I don't see how that is supposed to be interpreted as `190007788462`. – Elliott Brossard Nov 09 '17 at 06:28
  • That is not the exact itemId representation but the column is of type bytes – Karan Ashar Nov 09 '17 at 06:30

1 Answers1

2

Try this:

SELECT CAST(CONCAT('0x', TO_HEX(itemId)) AS INT64) AS itemId
FROM YourTable;

It converts the bytes into a hex string, then casts that string into an INT64. Note that the query uses standard SQL, as opposed to legacy SQL. If you want to try it with some sample data, you can run this query:

WITH `YourTable` AS (
  SELECT b'\x00\x00\x00\xAFP]F\xAA' AS itemId UNION ALL
  SELECT b'\xFA\x45\x99\x61'
)
SELECT CAST(CONCAT('0x', TO_HEX(itemId)) AS INT64) AS itemId
FROM YourTable;
Elliott Brossard
  • 32,095
  • 2
  • 67
  • 99
  • Perfect. That worked well. How did you infer that you needed to `CONCAT('0x', TO_HEX(itemId))` ? – Karan Ashar Nov 09 '17 at 16:45
  • 2
    I've used this pattern before when doing byte manipulation. You can cast decimal strings (e.g. `1234`) to INT64, and you can also cast hexadecimal strings (e.g. `0x04D2`). Since there is a `TO_HEX` function, you can get a hexadecimal string that you can cast after prepending the `0x` :) – Elliott Brossard Nov 09 '17 at 16:50
  • Interesting. Is there any pattern for for `floats` ? I tried the same pattern above but that didn't work. For example I have `price` column which is stored as bytes of a `float` type. Example `price` is `3.0` and in bigtable I see entry such as `@\x02Q\xEB\x85\x1E\xB8R` (might not be value for `3.0`) – Karan Ashar Nov 09 '17 at 16:55
  • I think it might be possible to cast to FLOAT64 as well from a hex representation, but I don't know that you'll get the expected result, and I don't think this functionality is documented. Being able to convert from a hex string to FLOAT64 is a side effect of the number parser, not something intentional I think. – Elliott Brossard Nov 09 '17 at 17:11
  • Sure. I will try that – Karan Ashar Nov 09 '17 at 17:12