0

We are migrating from spark1.6 to spark2.4. In this process, am planning to modify one of our streaming codes. I'm planning to use structured streaming.

In the existing streaming, we are joining the streaming DF(converted RDD to DF) to a blacklist file (which is again a DF). We are refreshing the blacklist DF every day at 6AM. But how can we refresh the DF in spark structured streaming. Am using the below logic to refresh the DF in 1.6 using RDD. But I would like to know if I can get the batch time in spark structured streaming from DF without converting it to RDD.

foreachRDD( (rdd, time) -> {
      ...
      ...

      if (nextRefreshTime > time) {
        //refresh the DF 
        // set nextRefreshTime = next day 6AM
      }

    })

Thanks

Kiran
  • 451
  • 1
  • 6
  • 23
  • for Dstreams or structured streaming RDD , DF are same. suggest you to use database or s3 or hdfs lookup based on your requirements(1 day) not too frequent – Ram Ghadiyaram Jul 31 '19 at 04:16
  • [see this as well](https://stackoverflow.com/questions/45281710/refresh-dataframe-in-spark-real-time-streaming-without-stopping-process/45289187) – Ram Ghadiyaram Jul 31 '19 at 04:20

0 Answers0