1

I'm trying to download files from gdelt database with Scala using Databricks. i wrote this code :

%sh
mkdir -p /dbfs/tmp/gdelt 
MASTER_URL=http://data.gdeltproject.org/gdeltv2/masterfilelist.txt

if [[ -e /tmp/gdelt ]] ; then
    rm -rf /tmp/gdelt
fi
mkdir /tmp/gdelt

echo "Retrieve latest URL from [${MASTER_URL}]"
URLS=`curl ${MASTER_URL} 2>/dev/null | awk '{print $3}' | grep gkg.csv.zip | grep gdeltv2/202101`
for URL in $URLS; do
    echo "Downloading ${URL}"
    wget $URL -O /tmp/gdelt/gdelt.csv.zip > /dev/null 2>&1
    unzip /tmp/gdelt/gdelt.csv.zip -d /tmp/gdelt/ > /dev/null 2>&1
    echo "Retrieve latest URL from [${MASTER_URL}]"
    LATEST_FILE=`ls -1rt /tmp/gdelt/*.csv | head -1`
    LATEST_NAME=`basename ${LATEST_FILE}`
    cp $LATEST_FILE /dbfs/tmp/gdelt/$LATEST_NAME
    rm -rf /tmp/gdelt/gdelt.csv.zip
    rm $LATEST_FILE
done

Scala code:

import com.aamend.spark.gdelt._
val gdeltDF = spark.read.gdeltGkg("/tmp/gdelt")
gdeltDF.write.format("delta").mode("append").saveAsTable("esg.gdelt")

SQL query:

%sql
SELECT to_date(publishDate) AS date, COUNT(*) 
  FROM esg.gdelt
  GROUP BY date
  ORDER BY date ASC

But i always get this error: Query returned no results. I download all the packages that i need but i get always the some error . does anyone have faced this problem ?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • do you see files on DBFS? - you can use `%fs ls /tmp/gdelt` for that. what `gdeltDF.count` returns? – Alex Ott Feb 03 '22 at 08:40
  • gdeltDF.count return long= 0 but the command %fs ls /tmp/gdelt returns the list of files csv – Nader bouchnag Feb 03 '22 at 12:21
  • then I would say that the bug somewhere in `spark.read.gdeltGkg`. either file format is incorrect, or something like this – Alex Ott Feb 03 '22 at 12:48
  • master_url = 'http://data.gdeltproject.org/gdeltv2/masterfilelist.txt' master_file = urllib.request.urlopen(master_url) min_date = datetime.strptime(getParam('gdelt_raw_min_date'), '%Y%m%d%H%M%S') /// when i use this code it returns name 'getParam' is not defined any solution? please – Nader bouchnag Feb 03 '22 at 12:57
  • https://databricks.com/notebooks/esg_notebooks/02_esg_scoring.html https://databricks-web-files.s3.us-east-2.amazonaws.com/notebooks/esg_scoring/index.html#esg_scoring_3-1.html – Nader bouchnag Feb 03 '22 at 13:00

0 Answers0