I'm playing with the TopologyTestDriver of Kafka Streams in order to get our data pipelines tested.
It has worked like a charm with all our simple topologies, including the stateful ones that use Stores. My problem is when I try to use this test driver in order to test topologies that use window aggregation.
I've copied a simple example that sums integers received with the same key within a 10 seconds window.
public class TopologyWindowTests {
TopologyTestDriver testDriver;
String INPUT_TOPIC = "INPUT.TOPIC";
String OUTPUT_TOPIC = "OUTPUT.TOPIC";
@Before
public void setup(){
Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, "test");
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "dummy:1234");
// EventProcessor is a <String,String> processor
// so we set those serders
config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.Integer().getClass());
testDriver = new TopologyTestDriver(defineTopology(),config,0L);
}
/**
* topology test
*/
@Test
public void testTopologyNoCorrelation() throws IOException {
ConsumerRecordFactory<String, Integer> factory = new ConsumerRecordFactory<>(INPUT_TOPIC, new StringSerializer(), new IntegerSerializer());
testDriver.pipeInput(factory.create(INPUT_TOPIC,"k",2,1L));
ProducerRecord<String, Integer> outputRecord = testDriver.readOutput(OUTPUT_TOPIC, new StringDeserializer(), new IntegerDeserializer());
Assert.assertNull(outputRecord);
}
@After
public void tearDown() {
testDriver.close();
}
/**
* Defines topology
* @return
*/
public Topology defineTopology(){
StreamsBuilder builder = new StreamsBuilder();
KStream<String,Integer> inputStream = builder.stream(INPUT_TOPIC);
KTable<Windowed<String>, Integer> groupedMetrics = inputStream.groupBy((key,value)->key,
Serialized.with(Serdes.String(),Serdes.Integer())).windowedBy(TimeWindows.of(TimeUnit.SECONDS.toMillis(10))).aggregate(
()-> 0,
(String aggKey, Integer newValue, Integer aggValue)->{
Integer val = aggValue+newValue;
return val;
},
Materialized.<String,Integer,WindowStore<Bytes,byte[]>>as("GROUPING.WINDOW").withKeySerde(Serdes.String()).withValueSerde(Serdes.Integer())
);
groupedMetrics.toStream().map((key,value)->KeyValue.pair(key.key(),value)).to(OUTPUT_TOPIC);
return builder.build();
}
}
I would expect that in this test case nothing is returned to the output topic unless I advance the wall clock time 10 seconds... But I'm getting the following output
java.lang.AssertionError: expected null, but was:<ProducerRecord(topic=OUTPUT.TOPIC, partition=null, headers=RecordHeaders(headers = [], isReadOnly = false), key=k, value=2, timestamp=0)>
Am I missing something here? I'm using kafka 2.0.0
UPDATE
Thanks in advance
According to Matthias response, I've prepared the following test:
@Test
public void testTopologyNoCorrelation() throws IOException {
ConsumerRecordFactory<String, Integer> factory = new ConsumerRecordFactory<>(INPUT_TOPIC, new StringSerializer(), new IntegerSerializer());
testDriver.pipeInput(factory.create(INPUT_TOPIC,"k",2,1L));
testDriver.pipeInput(factory.create(INPUT_TOPIC,"k",2,1L));
// Testing 2+2=4
ProducerRecord<String, Integer> outputRecord1 = testDriver.readOutput(OUTPUT_TOPIC, new StringDeserializer(), new IntegerDeserializer());
Assert.assertEquals(Integer.valueOf(4),outputRecord1.value());
// Testing no more events in the window
ProducerRecord<String, Integer> outputRecord2 = testDriver.readOutput(OUTPUT_TOPIC, new StringDeserializer(), new IntegerDeserializer());
Assert.assertNull(outputRecord2);
}
Both input messages has been sent with the same timestamp, so I'm expecting only one event in the output topic with the sum of my values. However, I'm receiving 2 events in the output (the first one with a value of 2, and the second one with a value of 4), which I think is not the desired behaviour of the topology.