1

I have an app that exports files to an S3 bucket every certain amount of time. I need to develop a Spark Streaming app that streams from this bucket and delivers the lines of the new files every 30 secs.

I have read this post which helped me understanding about the credentials, but still won’t address my needs.

Q1. Could anyone provide some code or hint on how to do this? I’ve seen the twitter example but I could not figure out how to apply it to my scenario.

Q2. How does Spark Streaming know which was the last file that streamed before picking up the next one? Is this based on the file’s LastModified header or some sort of timestamp?

Q3. If the cluster goes down, how do I manage to start streaming from where I left?

Thanks in advance!!

Community
  • 1
  • 1
EasyB
  • 23
  • 4
  • I'm not sure you can do that out-of-box with SparkStreaming. I would simply implement this myself with a state file, a while loop and `sc.textFile`, it's probably around 10 lines of code. I can show you if you want? – samthebest Aug 04 '14 at 09:11
  • Spark Streaming does have [a way to monitor a directory for new files](https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-sources): `StreamingContext.fileStream()` – Nick Chammas Aug 04 '14 at 16:18
  • Thanks Nick, I will take a look. If s3n is a Hadoop-compatible FS it could work; as stated in the link you provided: "Spark Streaming will monitor the directory dataDirectory for any Hadoop-compatible filesystem and process any files created in that directory." – EasyB Aug 04 '14 at 17:08
  • Yep, it is, so I would expect it to work. (By the way, make sure to [@mention](http://meta.stackoverflow.com/editing-help#comment-reply) people you are replying to so they get notified!) – Nick Chammas Aug 04 '14 at 20:41
  • Yes @samthebest show me, that would be great! – EasyB Aug 04 '14 at 21:53
  • @EasyB well try `StreamingContext.fileStream` first as that appears to be a more direct answer. You'll just need to build in some state recording logic to answer Q3. – samthebest Aug 05 '14 at 17:32
  • What was the result of this? – ndtreviv Oct 06 '16 at 14:20

0 Answers0