I'm new to stream processing (kafka streams / flink / storm / spark / etc.) and trying to figure out the best way to go about handling a real world problem, represented here by a toy example. We are tied to Kafka for our pubsub/data ingestion, but have no particular attachment in terms of stream processor framework/approach.
Theoretically, suppose I have a source emitting floating point values sporadically. Also at any given point there is a multiplier M that should be applied to this source's values; but M can change, and critically, I may only find out about the change much later - possibly not even "in change order."
I am thinking of representing this in Kafka as
"Values": (timestamp, floating point value) - the values from the source, tagged with their emission time.
"Multipliers": (timestamp, floating point multiplier) - indicates M changed to this floating point multiplier at this timestamp.
I would then be tempted to create an output topic, say "Results", using a standard stream processing framework, that joins the two streams, and merely multiplies each value in Values with the current multiplier determined by Multipliers.
However, based on my understanding this is not going to work, because new events posted to Multipliers can have arbitrarily large impact on results already written to the Results stream. Conceptually, I would like to have something like a Results stream that is current as of the last event posted to Multipliers against all values in Values, but which can be "recalculated" as either further Values or Multipliers events come in.
What are some techniques for achieving/architecting this with kafka and major stream processors?
Example:
Initially,
Values = [(1, 2.4), (2, 3.6), (3, 1.0), (5, 2.2)]
Multipliers = [(1, 1.0)]
Results = [(1, 2.4), (2, 3.6), (3, 1.0), (5, 2.2)]
Later,
Values = [(1, 2.4), (2, 3.6), (3, 1.0), (5, 2.2)]
Multipliers = [(1, 1.0), (4, 2.0)]
Results = [(1, 2.4), (2, 3.6), (3, 1.0), (5, 4.4)]
Finally, after yet another event posted to Multipliers (and also a new Value emitted too):
Values = [(1, 2.4), (2, 3.6), (3, 1.0), (5, 2.2), (7, 5.0)]
Multipliers = [(1, 1.0), (4, 2.0), (2, 3.0)]
Results = [(1, 2.4), (2, 10.8), (3, 3.0), (5, 4.4), (7, 10.0)]