I'm working on a project that will determine whether or not I score an internship. The project focuses on stream processing and is due in 2 weeks. It's pretty simple, just deriving some statistics from a csv file and printing it to a GUI. The project looks something like this:
A provided CSV is formatted as
ID: int, OperatingSystem: str, Date: str, Score: int
I'm supposed to track the lowest, highest, and median scores
- per OS,
- per date, and
- across the entire dataset
Then I'm supposed to define a data structure for creating a histogram, also per date, OS, and entire dataset. I can use any language that I want, but I'd prefer Python if possible.
The problem is that I've never done any stream processing work before and I'm having trouble finding resources on how to actually put it into code. I've watched videos explaining kafka and looked into the docs and code samples for the faust and Maki Nage frameworks, but I've only gotten as far as crashing the program right off the bat and staring at doc pages scratching my head.
Are there any simple, well documented stream processing libraries that I should look into? Additionally, are there any resources that demonstrate how to actually write code for these libraries? Youtube seems to only focus on architectures and uml diagrams without any practical demonstrations, and I'm beginning to worry that I'll never understand how to build this project.
Thanks, Geisha