I am working on spark streaming job that requires to store intermediate results in order to reuse them in next window stream. Number of data is extremely large so probably there is no way to store it in spark cache. What is more I need in someway to read data by some 'key'. I was thinking about Cassandra as intermediate storage but it also has some drawbacks. Alternatively, maybe Kafka will be do the job but it will require additional work in order to select given portion of data by key.
Could you advise me what I should do? How such problems are resolved in Storm - is there any internal mechanism or it is preferred to use some external tools?