how to partition graph for pregel to maximize processing speed?

Question

I have a crowdsourcing application. data from users is collected and then processed and then updated for everyone to see. The data collection is almost real time. The processing speed is increasing as the users (data nodes) are increasing. I need to scale this.

Looking at scaling for graph based models, mapreduce seems to be famous. Is there a benchmarking paper comparing it to other techniques? Pregel is impressive. Please point me to any leads about 'partitioning' in pregel i.e, how a graph can be partitioned intelligently so as to minimize processes lagging behind each other.

score 0 · Answer 1 · answered May 14 '12 at 08:52

The problem of partitioning a graph 'intelligently' in order to minimize execution time is an interesting one, however it's not simple and it depends on your data and your algorithm. You might find also that, in practice, it's not necessary and a random partitioning is sufficiently good.

For example, if you are interested in exploring Pregel-like approaches, you can have a look at Apache Giraph and experiment with different partitioning techniques.

how to partition graph for pregel to maximize processing speed?

1 Answers1

Linked