1

I'm a bit new to SQL, so please be gentle...

I'm attempting to create a [legacy] BigQuery SQL self union like:

SELECT * FROM (query 1), (query 1)

Syntax found from another SO answer here. However query 1 is very long, and I don't want to have to copy/paste the entire query multiple times. Is there a way to name query 1 so I don't have to paste the whole thing over again?

I'm trying to do something like the following (with regards to the references to A):

SELECT 
    * 
FROM 
    (query 1) AS A,
    (SELECT 
        Category as Hier_Level,
        MAX(Department),
        Category,
        "VARIOUS" AS Subcategory,
        SUM(DOLLARS)
     FROM
         A
     GROUPBY
         Category) AS B,
    (SELECT 
        Department as Hier_Level,
        Department,
        "VARIOUS" AS Category,
        "VARIOUS" AS Subcategory,
        SUM(DOLLARS)
     FROM
         A
     GROUPBY
         Department,
         Category) AS C,
    ...

Update

While the question is focused on not repeating query 1's code, the final goal of the aggregation is to take data like the following, which is output by query 1:

Hier_Level  Department  Category Subcategory  Dollars
Subcategory Electronics TV       LCD          3500
Subcategory Electronics TV       OLED         6000
Subcategory Electronics Phone    iPhone       600

and return the same table with rows added for aggregations from groupbys. Results should look like:

Hier_Level  Department  Category Subcategory  Dollars
Subcategory Electronics TV       LCD          3500
Subcategory Electronics TV       OLED         6000
Subcategory Electronics Phone    iPhone       600
Category    Electronics TV       Various      9500
Category    Electronics Phone    iPhone       600
Department  Electronics Various  Various     10100

In standard SQL it's my understanding that something like this can be accomplished using CTE and union all, but neither are available in BQ Legacy SQL.

ZaxR
  • 4,896
  • 4
  • 23
  • 42
  • Could you show the exact query, or at least a representative smaller version of it? Maybe it can be rewritten in a different way. Legacy SQL doesn't support CTE, eliminating one good way to avoid repeating the query. – Tim Biegeleisen Jun 07 '18 at 02:12
  • @TimBiegeleisen - I've attached as much of an example of query 1 as I think I can - does that help? I know it's a bit abstract... – ZaxR Jun 07 '18 at 02:47
  • @ZaxR - you should provide example of input data and expected result, otherwise it will be shooting in the air – Mikhail Berlyant Jun 08 '18 at 02:31
  • @MikhailBerlyant Unfortunately I can't share an example of the data. My ultimate goal is to take query 1 and add rows to that table that are grouped aggregations (sums) of query 1. The various groupings are based on one or more levels of hierarchy. Does that help at all? – ZaxR Jun 08 '18 at 02:35
  • noone is really interested in your specific data! just mimic it and provide some dummy data that would reproduce your problem! – Mikhail Berlyant Jun 08 '18 at 02:36
  • @MikhailBerlyant - Added dummy input/output. Think that's the best I can do. – ZaxR Jun 08 '18 at 02:51
  • make sense now - at least for me - will answer when get time – Mikhail Berlyant Jun 08 '18 at 02:54

2 Answers2

1

Unfortunately I'm restricted to legacy SQL for a couple of other operations.

You can create View named query1 - so you will reference it as [project:dataset.query1] - Make sure you create it in Lagacy mode so you can then use it from query in legacy mode

So, now your query will be exactly (almost) as you asked - I'm trying to do something like the following (with regards to the references to A)

SELECT * FROM 
  [project:dataset.query1],
  (SELECT 
    'Category' AS Hier_Level,
    MAX(Department) Department,
    Category,
    'Various' AS Subcategory,
    SUM(Dollars) AS Dollars
   FROM [project:dataset.query1]
   GROUP BY Category),
  (SELECT 
    'Department' AS Hier_Level,
    Department,
    'Various' AS Category,
    'Various' AS Subcategory,
    SUM(Dollars) AS Dollars
   FROM [project:dataset.query1]
   GROUP BY Department) 

Having your example - the result will be as expected

Row Hier_Level  Department  Category    Subcategory Dollars  
1   Subcategory Electronics TV          LCD          3500    
2   Subcategory Electronics TV          OLED         6000    
3   Subcategory Electronics Phone       iPhone        600    
4   Category    Electronics TV          Various      9500    
5   Category    Electronics Phone       Various       600    
6   Department  Electronics Various     Various     10100    
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Thanks! It's a bit of a bummer this can't be done without an intermediate view, because the query is created dynamically from a Python program that will result in dozens of different views being created - many that will only be used one time. I guess I'll have to look more into creating and deleting views via the API. – ZaxR Jun 08 '18 at 22:34
0

First, use standard SQL, not legacy SQL.

I think I would do this as:

select t.*
from (query) q cross join
     (select 1 union all select 2 union all select 3);

The cross join will multiply each row the number of times in the second select.

EDIT:

Having to use legacy SQL is really sad. But something like this should work:

select q.*
from (<query>) q CROSS JOIN
     (select n
      from (SELECT 1 as n), (SELECT 2 as n), (SELECT 3 as n)
     ) n
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Unfortunately I'm restricted to legacy SQL for a couple of other operations. Is there a variant of this that would work with legacy? – ZaxR Jun 07 '18 at 02:05
  • Thanks for the update. This isn't quite what I'm looking for. I'm looking to take query 1 and add rows to that table that are grouped aggregations of query 1 (see example above) - I might just be missing it, but how can I modify the above to accomplish that? – ZaxR Jun 07 '18 at 03:22