1

I am developing an api to read data from Kafka and write into blob storage in Java using Spark Structured Streaming. I could not find a way to write unit test for that. I have a reader class which returns a dataset and a writer class which talked a dataset as input and write into blob storage in the specified format. I saw some blogs on MemoryStream and but don't think it will suffice my case.

Thanks in advance.

  • you need to be a little more precise: which part of the code do you want to unit-test? reading from kafka, write to blob storage (which type?), or some intermediary transform you might be applying? Depending on the part the unit test might not be the same. And for input (kafka) and output (storage) you will need to mock the outside system for the test – Juh_ Dec 01 '20 at 18:07

1 Answers1

3

Apparently, You can refer this answer on how we can use memory streams for unit testing - Unit Test - structured streaming

Also, you can look at this spark-testing-base from Holden Karau. Spark testing base

And you can mock the streaming data frame coming from Kafka and run test cases for the transformation you have in your code on top of that dataframe.

Sample:

static Dataset<Row> createTestStreamingDataFrame() {
    MemoryStream<String> testStream= new MemoryStream<String>(100, sqlContext(), Encoders.STRING());
    testStream.addData((Arrays.asList("1,1","2,2","3,3")).toSeq());
    return testStream.toDF().selectExpr(
        "cast(split(value,'[,]')[0] as int) as testCol1",
        "cast(split(value,'[,]')[1] as int) as testCol2");
}
Shane
  • 588
  • 6
  • 20