i'm involved in a project with 2 phases and i'm wondering if this is a big data project (i'm newbie in this field)
In the first phase i have this scenario:
- i have to collect huge amont of data
- i need to store them
- i need to build a web application that shows data to the users
In the second phase i need to analyze stored data and builds report and do some analysis on them
Some example about data quantity; in one day i may need to collect and store around 86.400.000 record
Now i was thinking to this kind of architecture:
- to colect data some asynchronous tecnology like Active MQ and MQTT protocol
- to store data i was thinking about a NoSQL DB (mongo, Hbase or other)
Now this would solve my first phase problems
But what about the second phase?
I was thinking about some big data SW (like hadoop or spark) and some machine learning SW; so i can retrieve data from the DB, analyze them and build or store in a better way in order to build good reports and do some specific analysis
I was wondering if this is the best approach
How would you solve this kind of scenario? Am I in the right way?
thank you
Angelo