Questions tagged [gdelt]

The Global Database of Events, Language and Tone (GDELT) project provides 40+ Years of News from worldwide sources.

The GDELT dataset is a vast archive, dating back to 1979.

Further reference:

42 questions
6
votes
4 answers

Understanding Themes in Google BigQuery GDELT GKG 2.0

I'm using Google bigquery to analyze the GDELT GKG 2.0 dataset and would like to better understand how to query based on themes (or V2Themes). The docs mention a 'Category List' spreadsheet but so far I've been unsuccessful in finding that list.…
Rutger Hofste
  • 4,073
  • 3
  • 33
  • 44
6
votes
0 answers

OrientDB 2.0.0 Bulk Load Using Java API is CPU-Bound

I'm using OrientDB 2.0.0 to test its handling of bulk data loading. For sample data, I'm using the GDELT dataset from Google's GDELT Project (free download). I'm loading a total of ~80M vertices, each with 8 properties, into the V class of a blank…
Patrick Hoeffel
  • 113
  • 1
  • 9
4
votes
2 answers

Data collection from GDELT using bigquery

I am trying to construct an economic indicator based on all events with specific cameo codes from gdelt database. So the idea is to collect data from 1990 to till date and see how economic cooperation varied based on news appearances of certain…
bogathi
  • 43
  • 1
  • 6
4
votes
1 answer

Bigquery Standard Dialect REGEXP_REPLACE input type

I am exploring the power of Google Biguery with the GDELT database using this tutorial however the sql dialect is in 'legacy' and I would like to use the standard dialect. In legacy dialect: SELECT theme, COUNT(*) AS count FROM ( SELECT …
Rutger Hofste
  • 4,073
  • 3
  • 33
  • 44
2
votes
1 answer

Is there a way to remove characters in an array of string in BigQuery?

Using the GDELT public database in Google query, I am trying to find the top themes associated with Israeli Prime Minister Benjamin Netanyahu around March 3, 2015. I used the following SQL query SELECT theme, COUNT(*) as count FROM ( select…
user11197093
2
votes
1 answer

How to get more than 6 months of data from GDELT using Google BigQuery

I could not get more than 6 months of data from GDELT gkg table. For example, this query returns only results from 19 Feb 2015: SELECT Date, SourceCommonName, DocumentIdentifier FROM [gdelt-bq:gdeltv2.gkg] where (date < 20150220000000 and locations…
2
votes
1 answer

Pandas creates DataFrame with first header column in it's own row

I am working with the GDELT dataset am having issues creating a pandas DataFrame using pd.DataFrame.from_csv(path_to_data, sep=",") which seems to load the data fine except except for the fact that the first header column is shifted to row 1 like…
cdlm
  • 565
  • 9
  • 20
1
vote
1 answer

How do I search the GDELT database hosted in Google's BigQuery for a keyword, as I can with the GDELT API?

How can I query articles in BigQuery GDELT that make mention of a keyword such as "climate change" (despite the lack of a keyword column)? Admittedly, I am a complete novice at working with databases and may be misunderstanding something simple, so…
Patrick T.
  • 11
  • 2
1
vote
0 answers

Some streams terminated before this command could finish error

I am trying to read streaming data into Azure Databricks . This is the code i've been using: And its giving me an error saying: my Databrick Runtime is : 6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11) and i install the package :…
1
vote
0 answers

" Query returned no results " Gdelt

I'm trying to download files from gdelt database with Scala using Databricks. i wrote this code : %sh mkdir -p /dbfs/tmp/gdelt MASTER_URL=http://data.gdeltproject.org/gdeltv2/masterfilelist.txt if [[ -e /tmp/gdelt ]] ; then rm -rf…
1
vote
0 answers

Hierarchy Structure of Themes in Google BigQuery GDELT GKG 2.0

We're using Google bigquery to analyze the GDELT GKG 2.0 dataset and would like to better understand how to query based on themes (or V2Themes). More specifically we are interested in the hierarchy structure (how hierarchies are created) of the…
1
vote
1 answer

GDELT: count occurence of specific themes

I am trying to count how often the term "BITCOIN" occurs in the Themes column of the GDELT database, and then group the counts by date. Here is what I have so far: SELECT DATE, SPLIT(RTRIM(Themes,';'),';') themes FROM…
Son
  • 159
  • 1
  • 11
1
vote
1 answer

MySQL float column with empty value ERROR 1265 (01000): Data truncated for column

I'm trying to store the GDELT dataset in MySQL database (MySQL 8.0, RHEL 7) but it returned an ERROR 1265(01000) because one float column has empty values in it: CREATE TABLE event ( GlobalEventID INT NOT NULL, Day INT NOT NULL, …
leoce
  • 715
  • 1
  • 8
  • 24
1
vote
1 answer

What is the cause of this BigQuery search error?

SELECT a.name, b.name, COUNT(*) as count FROM (FLATTEN( SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', ")) name FROM [gdelt-bq:gdeltv2.gkg] WHERE DATE>20150302000000 and DATE < 20150304000000 and V2Persons…
1
vote
2 answers

BigQuery: filter according to counts in nested field

I am trying to look up records that have 5 or more mentions of "BE" or "Belgium" in a nested field. The below query does not yield any results: #standardSQL SELECT GKGRECORDID FROM `gdelt-bq.gdeltv2.gkg_partitioned` where _PARTITIONTIME BETWEEN…
Son
  • 159
  • 1
  • 11
1
2 3