I am trying to construct an economic indicator based on all events with specific cameo codes from gdelt database.
So the idea is to collect data from 1990 to till date and see how economic cooperation varied based on news appearances of certain words. CAMEO codes 0211, 0311, 061, 1011 and 1211 in specific.
My query is how to extract this data for these specific cameo codes. If you can direct me to any source, it would be of great help.
One person suggested me that try using bigquery. I honestly don't know how to navigate the google bigquery page till now (I tried my best probably being from a non-tech background, it was a bit overwhelming for me). If any of you can help with one Cameo code data extraction example then I can play around with other events.
Edit: I am editing to show the progress I have made and the issues I am facing while running the query.
SELECT
*
FROM
[gdelt-bq:full.events]
WHERE
Year >= 1979
AND EventCode IN ('0211', '0311','061', '1011', '1211')
AND Actor1CountryCode != Actor2CountryCode
This query will process 228 GB when run and also excludes the cases where both the country codes are null. It has over 2 million rows and I cant download this as a csv file from bigquery platform.
The part where I need help is the following, is there any way that I can get the total number of events for each event code which satisfies the following conditions
- Actor1Countrycode and Actor2CountryCode should be different except when they are null
- Count for each event code every month which satisfies the above condition.
PS: You can run the code given by Ben P in the answer below to see the number and type of columns in the database.
Edit2: There is another query that I am trying to write where in the AvgTone of an event with a specified code is greater than the average of AvgTone of all events in that particular month. Any leads on how to write this would be really helpful. Suppose, I add a WHERE clause wherein the AvgTone is greater than the average of AvgTone of all events for that particular period (MonthYear in this case). My doubt is how to write this in a query format.
SELECT
MonthYear,
COUNT(*)
FROM
[gdelt-bq:full.events]
WHERE
EventCode IN ('0211',
'0311',
'061')
AND Actor1CountryCode != Actor2CountryCode
AND AvgTone > (SELECT AVG(AvgTone) FROM [gdelt-bq:full.events] GROUP BY MonthYear ORDER BY MonthYear)
GROUP BY
MonthYear
ORDER BY
MonthYear
Error: ELEMENT can only be applied to result with 0 or 1 row.
Can someone help me with the above query? Thanks.