I have a crowdsourcing application. data from users is collected and then processed and then updated for everyone to see. The data collection is almost real time. The processing speed is increasing as the users (data nodes) are increasing. I need to scale this.
Looking at scaling for graph based models, mapreduce seems to be famous. Is there a benchmarking paper comparing it to other techniques? Pregel is impressive. Please point me to any leads about 'partitioning' in pregel i.e, how a graph can be partitioned intelligently so as to minimize processes lagging behind each other.