First of all I may be misinformed about BigData capability nowadays. So, don't hesitate to correct me if I'm too optimistic.
I usually work with regular KPIs, like show me: count of new clients where they meets certain complex conditions (joining few fact tables) for every manager during certain month.
These requests are quite dynamical, thus there is no way to predict pre-calculated data. We use OLAP and MDX for dynamic reporting. The price of dynamic calculating is performance. Users usually wait for the result more than a minute.
Here I got to BigData. I've read some articles, forums, docs leading me to ambiguous conclusions. BigData provides tools to handle data in seconds, however it doesn't fit BI tasks well, like joins, pre-agreggation. There is no the classical DWH over the hadoop concept and so on.
Nonetheless, It's a theory. I've found Kylin which makes me give a try it practically. The more I'm digging, the more questions appear. Some of them:
- Do I need any programing knowledge (Java, Scala, Python)?
- Do I need graphical tools, ssh access is enough?
- What hardware requirements meet my needs for 100-200 gigabytes DBs (also number of hardware)?
- What's the best filesystem (ext4), should I care at all?
- How can I migrate data from RDBMS, is there any smart ETLs?
- What technologies should I learn and use first (pig, spark, etc)?
Actually I might ask wrong questions and totally misunderstand the conception, but hoping for some good leads. Feel free to give any advice you consider useful about the BI and Bigdata consolidation.
I know about http://kylin.apache.org/docs15/index.html But I don't feel comfortable to try it without backend backgroung.