1

I have a table that stores logs into one row. I have unnested it and one example row looks like this:

TestObserver(2) TestFragment(1) TestView(1) TestNotifications(2) TestActivity(1) and so on.

I would like to conveert TestObserver, TestFragment and so on into columns and its values (2), (1), etc into its rows. There are plenty of these Test logs, it is not a fixated number of them. There is no delimiter between them. I would also like to SUM each TestObserver, TestFragment and so on values found in each row.

How can I achieve this?

I tried trimming and splitting but the result is not what it is expected.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Try to get a bit clarification here. is there always exactly a space between each term? Such as TestObserver(2)[SPACE]TestFragment(1). – Junjie Feb 23 '23 at 12:19

1 Answers1

0

Assume there is no consistent delimiter(not even a space) between terms in your log, we can still get your desired results by using REGEXP_EXTRACT_ALL() function and UNNEST operator in BigQuery.

Step 1. Use regex_replace() to move all spaces in the log

step 2. Use REGEXP_EXTRACT_ALL() to fetch each (term(quantity)) token into an array.

Step 3. UNNEST the array to one (term(quantity)) token per row

Step 4. Use regex_extract() to fetch term and quantity separately. Please, we need to use negative lookaround regex(?= and ?<=) to get the exact data we need for term and quantity.

Step 5. Group by term and sum() on quantity to get the result.

See the following code as example:

WITH log_extracted AS (
  SELECT
    REGEXP_EXTRACT_ALL(
      REGEXP_REPLACE(
        'TestObserver(2) TestFragment(1) TestView(1) TestNotifications(2) TestActivity(1)',
        '_',
        ''
      ),
      '\w+\(\d+\)'
    ) AS log_tokens
)
SELECT
  REGEXP_EXTRACT(log_token, '(\w+)(?=\(\d+\))') AS term,
  SUM(CAST(REGEXP_EXTRACT(log_token, '(?<=\w\()\d+') AS INT64)) AS quantity
FROM log_extracted
CROSS JOIN UNNEST(log_tokens) AS log_token
GROUP BY
  1

example output:

term quantity
TestFragment 1
TestView 1
TestObserver 2
TestNotifications 2
TestActivity 1
Junjie
  • 1,170
  • 1
  • 2
  • 6