We are migrating from spark1.6 to spark2.4. In this process, am planning to modify one of our streaming codes. I'm planning to use structured streaming.
In the existing streaming, we are joining the streaming DF(converted RDD to DF) to a blacklist file (which is again a DF). We are refreshing the blacklist DF every day at 6AM. But how can we refresh the DF in spark structured streaming. Am using the below logic to refresh the DF in 1.6 using RDD. But I would like to know if I can get the batch time in spark structured streaming from DF without converting it to RDD.
foreachRDD( (rdd, time) -> {
...
...
if (nextRefreshTime > time) {
//refresh the DF
// set nextRefreshTime = next day 6AM
}
})
Thanks