What are advantages of Cassandra over HBase when it comes to MapReduce jobs?
I have a lot of small files that I would like to move from HDFS to a database and that files would be input for MapReduce jobs. I don't take all files, but for a certain user, so possibly the whole row, at least a column family. I could take files from certain period.
I know that HBase is the Hadoop database, so I expect that integrates good for what I need, but I also read that Cassandra has much better performance. But I would like to know what is the situation when you use it as input for MapReduce jobs. Is the performance still a lot better than in case of HBase?
I must emphasize that I'm not looking for comparison of HBase and Cassandra in general, but in concrete case of MapReduce jobs. Questions like this do not talk concretely about performance in case of MapReduce jobs. Also, I'm looking for fresh information (the question I mentioned is from 2011, I believe there might have been some changes since then).