I have one Apache access log file which has some data and it is continuously increasing. I want to analyze that data using Apache Spark Streaming API.
And Spark is new for me and i created one program in which ,i use jssc.textFileStream(directory)
function to get log data. but its not work as per my requirement.
please suggest me some approaches to analyze that log file using spark.
Here is my code.
SparkConf conf = new SparkConf()
.setMaster("spark://192.168.1.9:7077")
.setAppName("log streaming")
.setSparkHome("/usr/local/spark")
.setJars(new String[] { "target/sparkstreamingdemo-0.0.1.jar" });
StreamingContext ssc = new StreamingContext(conf, new Duration(5000));
DStream<String> filerdd = ssc.textFileStream("/home/user/logs");
filerdd.print();
ssc.start();
ssc.awaitTermination();
This code does not return any data from existing files. This is only work when i create a new file but when i update that new file, program again does not return updated data.