createTempView
on two dataframes
then using sql query we can filter the only required date.
Example:
Option1: using createTempView:
df1.show()
//+----------+
//| Maxdate|
//+----------+
//|2020-01-01|
//+----------+
df2.show()
//+----------+----+----+
//| colDate|col1|col2|
//+----------+----+----+
//|2020-01-01| A| B|
//|2020-01-03| C| D|
//+----------+----+----+
df1.createOrReplaceTempView("tmp")
df2.createOrReplaceTempView("tmp1")
sql("select * from tmp1 where coldate > (select maxdate from tmp)").show()
//+----------+----+----+
//| colDate|col1|col2|
//+----------+----+----+
//|2020-01-03| C| D|
//+----------+----+----+
Option-2:Using dataframe variable:
Another way would be storing into variable then using the variable then use the variable in dataframe filter
.
val max_val=df1.collect()(0)(0).toString
df2.filter(col("colDate") > max_val).show()
//+----------+----+----+
//| colDate|col1|col2|
//+----------+----+----+
//|2020-01-03| C| D|
//+----------+----+----+
Option-3:Using dataframe crossJoin and expr:
In this case we are not creating variable instead using dataframe column to filter only the required rows.
df2.crossJoin(df1).
filter(expr("colDate > Maxdate")).
drop("Maxdate").
show()
//+----------+----+----+
//| colDate|col1|col2|
//+----------+----+----+
//|2020-01-03| C| D|
//+----------+----+----+