3

I'm using PySpark Structured Streaming with Kafka as a reader stream. For the sake of unit testing, I would like to replace Kafka reader stream with a mock. How can I do this?

The following question is equivalent to mine but I use Python (PySpark) instead of Scala. And I couldn't find MemoryStream in PySpark.

How to perform Unit testing on Spark Structured Streaming?

Ronen491
  • 61
  • 6
  • I don't think you need to directly use `MemoryStream`, you can do `df.writeStream.queryName("aggregates").outputMode("complete") .format("memory").start()` in python. See http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html – Rayan Ral Jul 19 '20 at 19:14
  • 1
    @RayanRal thanks for your reply. AFAIU, your snippet allows me to mock the writeStream (output) of the streaming while I'm trying to mock the readerStream (input). – Ronen491 Jul 20 '20 at 06:34
  • @Ronen491 haven't you found an answer yet? – VB_ Feb 15 '21 at 11:01
  • @VB_ I ended up with creating a json file for every message I would stream through kafka and used something like: `df = spark.readStream.format('json').option('maxFilesPerTrigger', 1).schema(StructType([StructField('key', StringType(), True), StructField('value', StringType(), True)])).load('*.json') ` – Ronen491 Feb 15 '21 at 14:02

0 Answers0