3

I am wondering whether a recursive UDF function in BigQuery is the right solution to what i'm doing. But first, is it possible to run a query from inside the UDF?

I see a similar question here: BigQuery : is it possible to execute another query inside an UDF? but the solution seems to be a workaround that executes straight SQL. In my case, I might have to call the UDF repeatedly/recursively without knowing in advance the number of steps (say 3-7 steps).

It's a simple use-case of building a relationship graph over user-name entries in a table, with X degrees of separation, where X will be supplied by the end-user as an argument. My guess is recursive-style UDF would work well, but is it possible?

****EDIT: More detail on the use-case:**
Consider a table with transaction data, which contains the counterparts in each row, along with some other information:

Buyer, Seller
Bob->Alice
Bob->Carol
Bob->John
John-Peter
John-Sam
Bob->Mary

Suppose I want to visualize the relationship of Bob with his counterparts, with 1 degree of separation (i.e. also showing each counterpart's relationships 1 step removed from Bob). I want to use the force graph like this one here: D3 Force-Collapsible Graph

This graph requires a .JSON file with the following structure:

{
    "name": "Bob", "size":5000,
    "children":
        [
            {"name":"Alice","size":3000},
            {"name":"Carol","size":3000},
            {"name":"John","size":3000,
            "children":[
                         {"name":"Peter","size":3000},
                         {"name":"Sam","size":3000}
                        ]},
            {"name":"Mary","size":3000}
        ]
}

so, with 1 degree of separation, Bob has 4 children, and out of those, John has 2 children. This can go deeper with X degrees of separation, ideally with X provided by the user, but practically can also be hard-coded to say level 3 or 5.

Community
  • 1
  • 1
VS_FF
  • 2,353
  • 3
  • 16
  • 34
  • Are you using legacy or standard SQL (they handle UDFs differently) e.g. https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions#sql-udf-structure – Graham Polley Jan 11 '17 at 12:01
  • Current implementation of BigQuery UDF does not support recursive calls. I suggest you to rather present your use-case along with how you try to solve it and what kind of issue you have so someone will help you . – Mikhail Berlyant Jan 11 '17 at 15:25
  • The use case is to generate a collapsible tree similar to [link](http://mbostock.github.io/d3/talk/20111116/force-collapsible.html) where the number of branches away from the center will be supplied by the user. The relations data is in BigQuery in simple UserA->UserB format and the front end is served via Node.JS as shown in the BigQuery documentation. The answer below based on GENERATE_ARRAY could work. Will investigate... ** as to legacy/standard SQL, I'd use whatever works for the task! – VS_FF Jan 11 '17 at 15:36
  • ideally, you should present example of your input data and desired output, otherwise it will be abstract back and forth based on assumptions - which usually is not productive at all – Mikhail Berlyant Jan 11 '17 at 17:35
  • Sorry for the delay, I updated the main question to include more details on the use-case; hope it's more clear now... – VS_FF Jan 12 '17 at 14:47

2 Answers2

2

You can have a JavaScript UDF make recursive calls, but it can't execute another SQL statement. If you know the number of recursions/iterations in advance, it may be possible to define a SQL function instead, such as:

#standardSQL
CREATE TEMP FUNCTION SumToN(x INT64) AS (
  (SELECT SUM(v) FROM UNNEST(GENERATE_ARRAY(1, x)) AS v)
);

Using GENERATE_ARRAY, you can create a for loop of the desired length. Here's another example that doesn't involve a UDF, but uses GENERATE_ARRAY to concatenate a variable number of strings:

#standardSQL
WITH T AS (
  SELECT 2 AS x, 'foo' AS y UNION ALL
  SELECT 4 AS x, 'bar' AS y)
SELECT
  y,
  (SELECT STRING_AGG(CONCAT(y, CAST(v AS STRING)))
   FROM UNNEST(GENERATE_ARRAY(1, x)) AS v) AS rep_y
FROM T;
+-----+---------------------+
| y   | rep_y               |
+-----+---------------------+
| foo | foo1,foo2           |
| bar | bar1,bar2,bar3,bar4 |
+-----+---------------------+
Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
Elliott Brossard
  • 32,095
  • 2
  • 67
  • 99
  • This looks like it could work -- is this something I can test out in WebUI, or would it only work in command-line? WebUI generates errors for the above code. Ultimately, I will need a statement that I could pass as a parameter to the QueryOptions JavaScript object in node.js -- would it work there? – VS_FF Jan 11 '17 at 15:55
  • Make sure to uncheck "Use Legacy SQL" under "Show Options" to run that (you'll need to add a query that uses it too), or you can put `#standardSQL` at the top of the query editor to try it. – Elliott Brossard Jan 11 '17 at 15:57
  • ah, but of course, it works this way) But how can I actually loop through the generated numbers and execute a SELECT for each? – VS_FF Jan 11 '17 at 17:18
  • You can use a subselect over the elements of `GENERATE_ARRAY`, for example, which lets you accomplish something similar. It may be possible for me or for someone else to give a more specific answer in relation to the question if you sketched out pseudocode for what you're hoping to accomplish. – Elliott Brossard Jan 11 '17 at 17:27
  • Sorry for the delay, I updated the main question to include more details on the use-case; hope it's more clear now... – VS_FF Jan 12 '17 at 12:38
  • "... make recursive calls ..." Are you saying that it is possible to have a recursive javscript UDF function in BigQuery? Do you have an example? I have tried to do something similar but it fails with the not very verbose "QUERY_ERROR" message. – Sourygna Aug 31 '20 at 15:39
2

Try below
It is generic enough and has quite simple pattern to follow if you will need to extend it to more degrees of separation
For the sake of example I introduced logic for size attribute - which is (in below example) literally size of item in terms of number of items in in it (including itself) - so it is essentially count of children + 1
So, enjoy:

#standardSQL
CREATE TEMP FUNCTION size(item STRING) AS (
  (SELECT CAST(IFNULL(1 + (LENGTH(item) - LENGTH(REPLACE(item, 'name', '')))/4, 1) AS STRING))
);
CREATE TEMP FUNCTION dress(parent STRING, children STRING) AS (
  (SELECT CONCAT('{"name":"', parent, '","size":', size(children), IFNULL(CONCAT(',"children":[', children, ']'), ''), '}'))
);
WITH items AS (
  SELECT 'Bob' AS parent, 'Alice' AS child UNION ALL
  SELECT 'Bob' AS parent, 'Carol' AS child UNION ALL
  SELECT 'Bob' AS parent, 'John' AS child UNION ALL
  SELECT 'John' AS parent, 'Peter' AS child UNION ALL
  SELECT 'John' AS parent, 'Sam' AS child UNION ALL
  SELECT 'Peter' AS parent, 'Sam' AS child UNION ALL
  SELECT 'Sam' AS parent, 'Mike' AS child UNION ALL
  SELECT 'Sam' AS parent, 'Nick' AS child UNION ALL
  SELECT 'Bob' AS parent, 'Mary' AS child 
), degree2 AS ( 
  SELECT d1.parent AS parent, d1.child AS child_1, d2.child AS child_2
  FROM items AS d1 LEFT JOIN items AS d2 ON d1.child = d2.parent
), degree3 AS (
  SELECT d1.*, d2.child AS child_3 
  FROM degree2 AS d1 LEFT JOIN items AS d2 ON d1.child_2 = d2.parent
), degree4 AS (
  SELECT d1.*, d2.child AS child_4 
  FROM degree3 AS d1 LEFT JOIN items AS d2 ON d1.child_3 = d2.parent
)
SELECT STRING_AGG(dress(parent, child_1), ',') AS parent FROM (
SELECT parent, STRING_AGG(dress(child_1, child_2), ',') AS child_1 FROM (
SELECT parent, child_1, STRING_AGG(dress(child_2, child_3), ',') AS child_2 FROM (
SELECT parent, child_1, child_2, STRING_AGG(dress(child_3, child_4), ',') AS child_3 FROM (
SELECT parent, child_1, child_2, child_3, STRING_AGG(dress(child_4, NULL), ',') AS child_4 FROM degree4
GROUP BY 1,2,3,4 ORDER BY 1,2,3,4 )
GROUP BY 1,2,3 ORDER BY 1,2,3 )
GROUP BY 1,2 ORDER BY 1,2 ) GROUP BY 1 ORDER BY 1 )  

It returns exactly what you need - see "beautified" version of it below

{"name": "Bob","size": 12,"children": [
    {"name": "Alice","size": 1},
    {"name": "Carol","size": 1},
    {"name": "John","size": 8,"children": [
        {"name": "Peter","size": 4,"children": [
            {"name": "Sam","size": 3,"children": [
                {"name": "Mike","size": 1},
                {"name": "Nick","size": 1} ]}
          ]},
        {"name": "Sam","size": 3,"children": [
            {"name": "Mike","size": 1},
            {"name": "Nick","size": 1} ]}
      ]},
    {"name": "Mary","size": 1}
  ]},
{"name": "John","size": 8,"children": [
    {"name": "Peter","size": 4,"children": [
        {"name": "Sam","size": 3,"children": [
            {"name": "Mike","size": 1},
            {"name": "Nick","size": 1} ]}
      ]},
    {"name": "Sam","size": 3,"children": [
        {"name": "Mike","size": 1},
        {"name": "Nick","size": 1} ]}
  ]},
{"name": "Peter","size": 4,"children": [
    {"name": "Sam","size": 3,"children": [
        {"name": "Mike","size": 1},
        {"name": "Nick","size": 1} ]}
  ]},
{"name": "Sam","size": 3,"children": [
    {"name": "Mike","size": 1},
    {"name": "Nick","size": 1} ]}

Most likely, above can be further generalized - but I thought it is already good enough for you to try :o)

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230