0

I want to automate the creation of "directory variables" from a set of URIS until the maximum number of directories is reached.

For example, if I had 4 directories from URI: "/A/B/C/17628.html", I would want to create the following variables:

  1. path_1 = "A"
  2. path_2 = "B"
  3. path_3 = "C"
  4. path_4 = "17628.html"

But if I had : "/A/D/E/F/178.html":

  1. path_1 = "A"
  2. path_2 = "D"
  3. path_3 = "E"
  4. path_4 = "F"
  5. path_5 = "178.html"

It's probable to have a URI with many directories (up to 20). To avoid creating all these variables by hand, I want to define them using the for loop (or another option). It's possible to use this loop in BigQuery?

Graham Polley
  • 14,393
  • 4
  • 44
  • 80
Mario M.
  • 802
  • 11
  • 26
  • What is the expected query output given the URI of `"/A/B/C/17628.html"`? It's not possible to create columns dynamically--unless you do so in your own code--but is the intention to get five rows where each one contains part of the URI? – Elliott Brossard Feb 13 '17 at 10:58
  • My dataset has in the first column the URIs. The query output must have in the columns the different subdirectories (path1, path2 ... pathn). If a URI has 1 subdirectory, path2 ... pathn would be filled as null. – Mario M. Feb 13 '17 at 11:11
  • I want something like this (code is not correct): SELECT FOR( i from 1 to MAX(LENGTH(URI)-LENGTH(REPLACE(URI, '/', '')) ) AS CONCAT(path, i) FROM[my_table_URI] – Mario M. Feb 13 '17 at 11:30

2 Answers2

1

Consider below version

#standardSQL
WITH yourTable AS (
  SELECT '/A/B/C/17628.html' AS uri UNION ALL
  SELECT '/A/D/E/F/178.html' AS uri
)
SELECT uri, CONCAT('path_', CAST(1 + OFFSET AS STRING)) AS pos, path
FROM yourTable, UNNEST(SPLIT(REGEXP_EXTRACT(uri, r'/(.*)/'), '/')) path WITH OFFSET
ORDER BY uri, OFFSET

result is :

uri                 pos     path     
/A/B/C/17628.html   path_1     A     
/A/B/C/17628.html   path_2     B     
/A/B/C/17628.html   path_3     C     
/A/D/E/F/178.html   path_1     A     
/A/D/E/F/178.html   path_2     D     
/A/D/E/F/178.html   path_3     E     
/A/D/E/F/178.html   path_4     F     

In most practical cases, having such a flattened schema versus pivoted - is much more easier to deal (query) with

In case if you still want to pivot above result - see one of my many answers on that topic - Transpose rows into columns in BigQuery (Pivot implementation)

Community
  • 1
  • 1
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
0

You need to specify columns in the select list explicitly; it isn't possible for columns themselves to be dynamic. If you are okay with getting the results back as an array, you could do something like this:

#standardSQL
WITH T AS (
  SELECT '/A/B/C/17628.html' AS path UNION ALL
  SELECT '/A/D/E/F/178.html' AS path
)
SELECT
  ARRAY(SELECT IFNULL(subpaths[SAFE_OFFSET(x)], '')
        FROM UNNEST(GENERATE_ARRAY(0, 19)) AS x) AS paths
FROM (
  SELECT SPLIT(path, '/') AS subpaths
  FROM T
);

If you wanted explicit path_1, path_2, etc. columns you could do:

#standardSQL
WITH T AS (
  SELECT '/A/B/C/17628.html' AS path UNION ALL
  SELECT '/A/D/E/F/178.html' AS path
)
SELECT
  subpaths[SAFE_OFFSET(1)] AS path_1,
  subpaths[SAFE_OFFSET(2)] AS path_2,
  subpaths[SAFE_OFFSET(3)] AS path_3,
  subpaths[SAFE_OFFSET(4)] AS path_4,
  subpaths[SAFE_OFFSET(5)] AS path_5,
  subpaths[SAFE_OFFSET(6)] AS path_6,
  subpaths[SAFE_OFFSET(7)] AS path_7,
  subpaths[SAFE_OFFSET(8)] AS path_8,
  subpaths[SAFE_OFFSET(9)] AS path_9,
  subpaths[SAFE_OFFSET(10)] AS path_10,
  subpaths[SAFE_OFFSET(11)] AS path_11,
  subpaths[SAFE_OFFSET(12)] AS path_12,
  subpaths[SAFE_OFFSET(13)] AS path_13,
  subpaths[SAFE_OFFSET(14)] AS path_14,
  subpaths[SAFE_OFFSET(15)] AS path_15,
  subpaths[SAFE_OFFSET(16)] AS path_16,
  subpaths[SAFE_OFFSET(17)] AS path_17,
  subpaths[SAFE_OFFSET(18)] AS path_18,
  subpaths[SAFE_OFFSET(19)] AS path_19,
  subpaths[SAFE_OFFSET(20)] AS path_20
FROM (
  SELECT SPLIT(path, '/') AS subpaths
  FROM T
);

Since I didn't want to write that list by hand, I ran a simple one-liner in my terminal:

for i in `seq 1 20`; do echo "subpaths[SAFE_OFFSET($i)] AS path_$i,"; done
Elliott Brossard
  • 32,095
  • 2
  • 67
  • 99