2

I'm trying to make a currently working SQL query that I have in BigQuery more streamlines and am running into the following issue:


Error: ON clause must be AND of = comparisons of one field name from each table, with all field names prefixed with table name. Consider using Standard SQL .google.com/bigquery/docs/reference/standard-sql/), which allows non-equality JOINs and comparisons involving expressions and residual predicates.


Below is the query that is giving the error above. The first LEFT JOIN works. When I added the second one, right below, I started getting the error. What I'm trying to do is get the human readable own.o.firstname and own.o.lastname values rather than the owner_id value of the deal record (o.properties.hubspot_owner_id.value), but in order to do so I need to join some tables.

I had to use CAST on the ON clause of the second JOIN because the fields are of different types in each table's respective schema. If I don't do that, I get the following error: Error: Join keys o.properties.hubspot_owner_id.value (string) and o.ownerid (int64) have types that cannot be automatically coerced.

The WHERE clause is just a suppression list to not return entries that have been deleted from the database.

SELECT o.*
FROM (
  SELECT
    o.dealid,
    o.properties.dealname.value,
    stages.Label,
    o.properties.closedate.value,
    o.properties.hubspot_owner_id.value,
    own.o.firstname,
    own.o.lastname,
    o.properties.amount.value,
    o.properties.createdate.value,
    o.properties.pipeline.value,
    o.associations.associatedcompanyids,
    ROW_NUMBER() OVER (PARTITION BY o.dealid ORDER BY o._sdc_batched_at DESC) as seqnum
  FROM [sample-table:hubspot.deals] o
  LEFT JOIN [sample-table:hubspot.sales_stages_lookup] stages ON o.properties.dealstage.value = stages.Internal_Value
  LEFT JOIN [sample-table:hubspot.owners_reporting] own ON CAST(o.properties.hubspot_owner_id.value AS INTEGER) = CAST(own.o.ownerid AS INTEGER)) o
WHERE o.dealid NOT IN (SELECT objectid FROM [sample-table:hubspot_suppression_list.data] WHERE subscriptiontype = 'deal.deletion') AND seqnum = 1
fdc
  • 75
  • 1
  • 9

1 Answers1

1

Use standard SQL in BigQuery instead, which supports expressions as part of the ON clause:

#standardSQL
SELECT o.*
FROM (
  SELECT
    o.dealid,
    o.properties.dealname.value AS dealname_value,
    stages.Label,
    o.properties.closedate.value AS closedate_value,
    o.properties.hubspot_owner_id.value AS hubspot_owner_id_value,
    own.o.firstname,
    own.o.lastname,
    o.properties.amount.value AS amount_value,
    o.properties.createdate.value AS createdate_value,
    o.properties.pipeline.value AS pipeline_value,
    o.associations.associatedcompanyids,
    ROW_NUMBER() OVER (PARTITION BY o.dealid ORDER BY o._sdc_batched_at DESC) as seqnum
  FROM `sample-table.hubspot.deals` o
  LEFT JOIN `sample-table.hubspot.sales_stages_lookup` stages ON o.properties.dealstage.value = stages.Internal_Value
  LEFT JOIN `sample-table.hubspot.owners_reporting` own ON CAST(o.properties.hubspot_owner_id.value AS INT64) = CAST(own.o.ownerid AS INT64)) o
WHERE o.dealid NOT IN (SELECT objectid FROM `sample-table.hubspot_suppression_list.data` WHERE subscriptiontype = 'deal.deletion') AND seqnum = 1

For more on the differences between legacy and standard SQL in BigQuery, see the migration guide.

Elliott Brossard
  • 32,095
  • 2
  • 67
  • 99
  • thank you for the answer. Is the expression what the problem is in the query? – fdc Jun 28 '18 at 23:08
  • Yes. Using standard SQL is recommended instead since it doesn't have this or other limitations. – Elliott Brossard Jun 28 '18 at 23:31
  • I just ran the query using standard SQL and got the following error... `Duplicate column names in the result are not supported. Found duplicate(s): value`. I've tried a bunch of modifications to the query, but I still can't figure it out. – fdc Jun 29 '18 at 03:07
  • See if this edit helps. The problem is that the query produces multiple columns in the result named `value`, but you can give them different names to resolve the error. – Elliott Brossard Jun 29 '18 at 03:21
  • That did it! I also realized that the inner query was returning a lot of columns named `value` and ended up using aliases for each of them via `AS`. It fixed it and ended up making the column names more meaningful anyway. Thank you for all of the help - it is sincerely appreciated! – fdc Jun 29 '18 at 15:11