0

I'm wanting to use GDELT to obtain a CSV containing a count of the number of articles containing a specific theme for all countries for a given number of years. Please note it's not quite the same as the events database, I'm specifically interested in the themes). There are some tutorials on the GDELT website (here), but they appear to be a little out of date regarding the regexp syntax and I'm not super familiar with SQL. Ideally, my output should look something like the following :

Year Country      Count
2001 Afghanistan  34234
2002 Afghanistan  11864
...
2001 Zambia        939
2002 Zambia        864

From my understanding and the tutorial, this code counts the number of articles by theme and day of the year rather than year (taken from here). It's close to what I want, but not quite there.

select date(_partitiontime) date, count(theme) occurences
from `gdelt-bq.gdeltv2.gkg_partitioned`, unnest(split(themes,';')) as theme 
where _partitiontime >= "2020-11-01 00:00:00" and _partitiontime < "2020-11-07 00:00:00" 
and lower(theme) like "%bitcoin%"
group by date
-- order by date  

I think I need to

  1. Add something to return and group by V2Locations at the country level
  2. Parse the data to get only the year

but am not sure how to do it. Any help would be greatly appreciated. Thanks!

thehand0
  • 1,123
  • 4
  • 14
  • It is a query to return data from the GDELT database. I have already provided an example of the desired results. – thehand0 Feb 23 '21 at 23:04
  • . . It might surprise you but most people will have no idea what is in that database. You are much more likely to get an answer if you provide sample data. – Gordon Linoff Feb 23 '21 at 23:09
  • Fair point. I would provide sample data if i could, but unfortunately I cant. It's essentially a massive database of news articles usually accessed by google bigquery (https://www.gdeltproject.org/) – thehand0 Feb 23 '21 at 23:13
  • If you known python you could use gdeltdoc library : https://github.com/alex9smith/gdelt-doc-api. – Jio Jul 27 '22 at 12:47

0 Answers0