I have an empty table in Hive
I mean there are no records in that table.
Using this empty table I have created a data frame
in pyspark
df = sqlContext.table("testing.123_test")
I have registered this data frame
as an temp table in
df.registerTempTable('mytempTable')
date=datetime.now().strftime('%Y-%m-%d %H:%M:%S')
In this table I have column called id
.
Now I want to query the temp table like below
min_id = sqlContext.sql("select nvl(min(id),0) as minval from mytempTable").collect()[0].asDict()['minval']
max_id = sqlContext.sql("select nvl(max(id),0) as maxval from mytempTable").collect()[0].asDict()['maxval']
Now I want to save date
, min_id
and max_id
into a file in HDFS
I have done like below:
from pyspark.sql import functions as f
(sqlContext.table("myTempTable").select(f.concat_ws(",", f.first(f.lit(date)), f.min("id"), f.max("id"))).coalesce(1).write.format("text").mode("append").save("/tmp/fooo"))
Now when I check the file in HDFS
it show all NULL values.
The file output in HDFS
is below.
NULL,NULL,NULL
What I want is
Date,0,0
Here date is the current timestamp
How can I achieve what I want.