5

I'm trying to perform a very simple query for Firebase events stored in Google BigQuery but I´m not able to find a way to do it.

In the Android app, I´m logging an event like this:

Bundle params = new Bundle();
params.putInt("productID", productId);
params.putInt(FirebaseAnalytics.Param.VALUE, value);
firebaseAnalytics.logEvent("productEvent", params);

So, in BigQuery I have something like this:

 ___________________ _______________________ ____________________________ 
| event_dim.name    | event_dim.params.key  | event_dim.params.int_value | 
|___________________|_______________________|____________________________|
| productEvent      | productID             | 25                         | 
|                   |_______________________|____________________________| 
|                   | value                 | 1253                       |
|___________________|_______________________|____________________________| 

When I get the data from this table I get two rows:

 ___________________ _______________________ ____________________________
|event_dim.name     | event_dim.params.key  | event_dim.params.int_value |
|___________________|_______________________|____________________________|
| productEvent      | productID             | 25                         |
| productEvent      | value                 | 12353                      |

But what I really need is a SELECT clause from this table to get the data as below:

 ___________________ _____________ _________
|   name            | productID   | value   |
|___________________|_____________|_________|
| productEvent      | 25          | 12353   |

Any idea or suggestion?

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
alvaro torrico
  • 675
  • 6
  • 17
  • You have a low rate. Important on SO, you have to mark accepted answers by using the tick on the left of the posted answer, below the voting. This will increase your rate. See how this works by visinting this link: http://meta.stackoverflow.com/questions/5234/how-does-accepting-an-answer-work#5235 – Pentium10 Feb 05 '17 at 14:49
  • You're right, thanks – alvaro torrico Feb 06 '17 at 17:16

2 Answers2

7

You can pivot the values into columns like this

SELECT 
  event_dim.name as name,
  MAX(IF(event_dim.params.key = "productID", event_dim.params.int_value, NULL)) WITHIN RECORD productID,
  MAX(IF(event_dim.params.key = "value", event_dim.params.int_value, NULL)) WITHIN RECORD value,
FROM [events] 

In case you want to generate this command using SQL, see this solution: Pivot Repeated fields in BigQuery

Community
  • 1
  • 1
Pentium10
  • 204,586
  • 122
  • 423
  • 502
4

Using standard SQL (uncheck "Use Legacy SQL" under "Show Options" in the UI), you can express the query as:

SELECT 
  event_dim.name as name,
  (SELECT value.int_value FROM UNNEST(event_dim.params)
   WHERE key = "productID") AS productID,
  (SELECT value.int_value FROM UNNEST(event_dim.params)
   WHERE key = "value") AS value
FROM `dataset.mytable` AS t,
  t.event_dim AS event_dim;

Edit: updated example to include int_value as part of value based on the comment below. Here is a self-contained example that demonstrates the approach as well:

WITH T AS (
  SELECT ARRAY_AGG(event_dim) AS event_dim
  FROM (
    SELECT STRUCT(
        "foo" AS name,
        ARRAY<STRUCT<key STRING, value STRUCT<int_value INT64, string_value STRING>>>[
          ("productID", (10, NULL)), ("value", (5, NULL))
        ] AS params) AS event_dim
    UNION ALL
    SELECT STRUCT(
        "bar" AS name,
        ARRAY<STRUCT<key STRING, value STRUCT<int_value INT64, string_value STRING>>>[
          ("productID", (13, NULL)), ("value", (42, NULL))
        ] AS params) AS event_dim
  )
)
SELECT 
  event_dim.name as name,
  (SELECT value.int_value FROM UNNEST(event_dim.params)
   WHERE key = "productID") AS productID,
  (SELECT value.int_value FROM UNNEST(event_dim.params)
   WHERE key = "value") AS value
FROM T AS t,
  t.event_dim AS event_dim;
Elliott Brossard
  • 32,095
  • 2
  • 67
  • 99
  • That looks great @Elliott Brossard. unfortunately I get an error when run this query: Error: Not found: Table :event_dim.params – alvaro torrico Oct 12 '16 at 15:39
  • having schema you provided in your question - Elliott's query `must` work and `is` working. So either you have slightly different schema or you something else - like typo in the query. etc. Just recommend to play more with this – Mikhail Berlyant Oct 12 '16 at 16:01
  • Do you mind sharing the query that you tried to run (or a job ID for the failed execution)? Mikhail is right that given your original question, what I posted will work assuming that you replace the `my-dataset.mytable` table name. – Elliott Brossard Oct 12 '16 at 16:13
  • The real name of my column is event_dim.params.value.int_value instead of event_dim.params.int_value. I don't know if that is the reason to make your query fail in my database. I tryied with: SELECT event_dim.name as name, (SELECT int_value FROM UNNEST(event_dim.params.value) WHERE key = "productID") AS productID, (SELECT int_value FROM UNNEST(event_dim.params.value) WHERE key = "value") AS value FROM myTable.app_events_20161010 and the error is: Error: Invalid table name: event_dim:params.value – alvaro torrico Oct 12 '16 at 17:28
  • I'm still not sure how you are getting that error. Can you share a job ID? You can find job IDs for past queries under "Query History" in the UI. I updated my example based on your feedback, though (to include `int_value` inside of `value`). – Elliott Brossard Oct 12 '16 at 18:07
  • The last job ID is: automatic-tea-787:bquijob_5dac956_157ba86d80a By the way, when I uncheck the legacy SQL the error reported is this: Cannot access field name on a value with type ARRAY>>, timestamp_micros INT64, ...>> at [2:13] – alvaro torrico Oct 12 '16 at 20:20
  • All right, I tried once more in my answer :) I missed that there are two levels of repetition--event_dim repeats as well. – Elliott Brossard Oct 12 '16 at 20:39