Using the GDELT public database in Google query, I am trying to find the top themes associated with Israeli Prime Minister Benjamin Netanyahu around March 3, 2015.
I used the following SQL query
SELECT theme, COUNT(*) as count
FROM (
select REGEXP_REPLACE(SPLIT(V2Themes,';'), r',.*', '') theme
from `gdelt-bq.gdeltv2.gkg`
where DATE>20150302000000 and DATE < 20150304000000 and V2Persons like '%Netanyahu%'
)
group by theme
ORDER BY 2 DESC
LIMIT 300
The split is necessary because the V2Themes column uses nested listings. I then want to remove the character offset. This should give me the following:
Row theme count
1 GENERAL_GOVERNMENT 33677
2 LEADER 33405
3 TAX_FNCACT_MINISTER 31174
4 .... ...
But I get an error instead:
No matching signature for function REGEXP_REPLACE for argument types: ARRAY<STRING>, STRING, STRING.
I understand that SPLIT() creates an array of strings, but I do not know how to resolve this issue.
Is there another function I should use, or can this problem be resolved in another way?
*EDIT
The query works when it is run with Legacy SQL (also need to change the quotation marks to squared brackets). How can I achieve the same using standard SQL?