What I'm planning to do is develop a reporting platform using existing data. I have an existing RDBMS which has large number of records. So I'm using. (Hadoop 2.7, Spark, Hive, JasperReports, Scoop - Architecuture)
- Scoop - Extract data from RDBMS to Hadoop
- Hadoop - Storage platform
- Hive - Datawarehouse
- Spark - Since Hive is more like batch processing Spark on Hive will speed up things
- JasperReports - To generate reports.
Given that I have already read the following
Which mode should I use? Why? Decision is based on what?