2

Using the GDELT public database in Google query, I am trying to find the top themes associated with Israeli Prime Minister Benjamin Netanyahu around March 3, 2015.

I used the following SQL query

SELECT theme, COUNT(*) as count
FROM (
select REGEXP_REPLACE(SPLIT(V2Themes,';'), r',.*', '') theme
from `gdelt-bq.gdeltv2.gkg`
where DATE>20150302000000 and DATE < 20150304000000 and V2Persons like '%Netanyahu%'
)
group by theme
ORDER BY 2 DESC
LIMIT 300

The split is necessary because the V2Themes column uses nested listings. I then want to remove the character offset. This should give me the following:

Row theme               count    
1   GENERAL_GOVERNMENT  33677    
2   LEADER              33405    
3   TAX_FNCACT_MINISTER 31174
4   ....                 ...

But I get an error instead:

No matching signature for function REGEXP_REPLACE for argument types: ARRAY<STRING>, STRING, STRING.    

I understand that SPLIT() creates an array of strings, but I do not know how to resolve this issue.

Is there another function I should use, or can this problem be resolved in another way?

*EDIT

The query works when it is run with Legacy SQL (also need to change the quotation marks to squared brackets). How can I achieve the same using standard SQL?

blong
  • 2,815
  • 8
  • 44
  • 110

1 Answers1

1

Below is for BigQuery Standard SQL

#standardSQL
SELECT theme, COUNT(*) AS COUNT
FROM (
  SELECT REGEXP_REPLACE(value, r',.*', '') theme
  FROM `gdelt-bq.gdeltv2.gkg`, UNNEST(SPLIT(V2Themes,';')) value
  WHERE DATE>20150302000000 AND DATE < 20150304000000 
  AND V2Persons LIKE '%Netanyahu%'
)
GROUP BY theme
ORDER BY 2 DESC
LIMIT 300  

Be aware of following: when you run query and it fails - the cost for you is zero (in most common cases).

This is not a case when your query is correct (as above one in this answer) - in such cases query will successfully complete and you will be billed as per

in Classic UI:

enter image description here

or in BigQuery Console:

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Thanks, that's great. I am using a free trial version - I also do not see an 'estimated cost', should I worry that these queries are costing me money? –  Dec 05 '19 at 16:19
  • if you care about $$ - you sure should - for estimated cost (and many other extras) check [BigQuery Mate](https://chrome.google.com/webstore/detail/bigquery-mate/nepgdloeceldecnoaaegljlichnfognh) – Mikhail Berlyant Dec 05 '19 at 16:22
  • just to add to my previous comment - even though you use free trial - it has credit cap of $500 - so if you will reach this before 12 month free trial duration - your free trial will expire ... – Mikhail Berlyant Dec 05 '19 at 17:47
  • I saw this - the credit cap is actually $300, and I've reached that already... –  Dec 05 '19 at 17:53
  • ok. you are right - $300! so even more important now for you to monitor your cost interactively in real time. so you should definitely try BQ Mate. I created it back in 2015 for then only available Classic UI - with quite a number of extra features - since then - more than 12K users used it. Recently- by quite popular demand, I added support for New UI (BQ Console) - but only for cost monitoring feature – Mikhail Berlyant Dec 05 '19 at 18:02
  • great. please check my profile to see my affiliation to this and few other BQ / GCP related products/tools :o) – Mikhail Berlyant Dec 05 '19 at 18:19