Spark Scala file stream

Question

I am new to Spark and Scala. I want to keep read files from folder and persist file content in Cassandra. I have written simple Scala program using file streaming to read the file content. it is not reading files from the specified folder.

Can anybody correct my below sample code ?

i am using Windows 7

Code:

 val spark = SparkHelper.getOrCreateSparkSession()
val ssc = new StreamingContext(spark.sparkContext, Seconds(1))
val lines = ssc.textFileStream("file:///C:/input/")
lines.foreachRDD(file=> {
  file.foreach(fc=> {
    println(fc)
  })
})
ssc.start()
ssc.awaitTermination()

}

Maybe approach the problem through scheduling. If you can't move already processed files out of the directory, you'd have to keep track of what has already been processed. Here are some related questions: https://stackoverflow.com/questions/30375571/running-scheduled-spark-job https://stackoverflow.com/questions/41831708/scheduling-spark-jobs-on-a-timely-basis — Stefan Gloutnikov, Mar 25 '18 at 19:05
I think , I need to use fileStreamcontext to solve my issue. i will re frame my question — Gnana, Mar 25 '18 at 19:08

score 0 · Answer 1 · answered Mar 26 '18 at 03:36

0

I think a normal spark job is needed for the scenario rather than spark streaming.Spark streaming is used in cases where your source is something like kafka or a normal port where there is continuous inflow of data.

answered Mar 26 '18 at 03:36

Kiran Balakrishnan

239
3
11

I can use wholeTextFiles. if i don't use streaming , my job ends immediately. how to keep my job to run continuously. it will be helpful if i get sample code – Gnana Mar 26 '18 at 04:19

Spark Scala file stream

1 Answers1