Scenario: I have a service that logs events like in this CSV example:
#TimeStamp, Name, ColorOfPullover
TimeStamp01, Peter, Green
TimeStamp02, Bob, Blue
TimeStamp03, Peter, Green
TimeStamp04, Peter, Red
TimeStamp05, Peter, Green
Events that e.g. Peter wears Green will occur very often in a row.
I have two goals:
- Keep the data as small as possible
- Keep all the relevant data
Relevant means: I need to know, in which time spans a person was wearing what color. E.g:
#StartTime, EndTime, Name, ColorOfPullover
TimeStamp01, TimeStamp03, Peter, Green
TimeStamp02, TimeStamp02, Bob, Blue
TimeStamp03, TimeStamp03, Peter, Green
TimeStamp04, TimeStamp04, Peter, Red
TimeStamp05, TimeStamp05, Peter, Green
In this format, I can answer questions like: Which color was Peter wearing at time TimeStamp02? (I can safely assume that each person wears the same color in between two logged events for the same color.)
Main question: Can I use an already existing technology to accomplish that? I.e. I can supply it with a continuous stream of events and it extracts and stores the relevant data?
To be precise, I need to implement an algorithm like this (pseudocode). The OnNewEvent
method is called for each line of the CSV example. Where parameter event
already contains the data from the line as member variables.
def OnNewEvent(even)
entry = Database.getLatestEntryFor(event.personName)
if (entry.pulloverColor == event.pulloverColor)
entry.setIntervalEndDate(event.date)
Database.store(entry)
else
newEntry = new Entry
newEntry.setIntervalStartDate(event.date)
newEntry.setIntervalEndDate(event.date)
newEntry.setPulloverColor(event.pulloverColor))
newEntry.setName(event.personName)
Database.createNewEntry(newEntry)
end
end