I'm trying to download files from gdelt database with Scala using Databricks. i wrote this code :
%sh
mkdir -p /dbfs/tmp/gdelt
MASTER_URL=http://data.gdeltproject.org/gdeltv2/masterfilelist.txt
if [[ -e /tmp/gdelt ]] ; then
rm -rf /tmp/gdelt
fi
mkdir /tmp/gdelt
echo "Retrieve latest URL from [${MASTER_URL}]"
URLS=`curl ${MASTER_URL} 2>/dev/null | awk '{print $3}' | grep gkg.csv.zip | grep gdeltv2/202101`
for URL in $URLS; do
echo "Downloading ${URL}"
wget $URL -O /tmp/gdelt/gdelt.csv.zip > /dev/null 2>&1
unzip /tmp/gdelt/gdelt.csv.zip -d /tmp/gdelt/ > /dev/null 2>&1
echo "Retrieve latest URL from [${MASTER_URL}]"
LATEST_FILE=`ls -1rt /tmp/gdelt/*.csv | head -1`
LATEST_NAME=`basename ${LATEST_FILE}`
cp $LATEST_FILE /dbfs/tmp/gdelt/$LATEST_NAME
rm -rf /tmp/gdelt/gdelt.csv.zip
rm $LATEST_FILE
done
Scala code:
import com.aamend.spark.gdelt._
val gdeltDF = spark.read.gdeltGkg("/tmp/gdelt")
gdeltDF.write.format("delta").mode("append").saveAsTable("esg.gdelt")
SQL query:
%sql
SELECT to_date(publishDate) AS date, COUNT(*)
FROM esg.gdelt
GROUP BY date
ORDER BY date ASC
But i always get this error: Query returned no results
. I download all the packages that i need but i get always the some error . does anyone have faced this problem ?