I've been experimenting with Flink streaming for a while, using benchmarks like the Yahoo streaming benchmark: https://github.com/yahoo/streaming-benchmarks which are supposed to stress the system but I never achieved a satisfying CPU utilization - in fact it was mainly as low as ~25% using all available system cores (parallelism = nodes*cores) and one TaskManager slot per core.
Recently, I started working with Gelly, Flink's Graph API using some of the provided example algorithms (e.g. Pagerank), batch-processing datasets varying from tens of thousands to hundreds of millions vertices.
I occupy four TaskManagers of 32 cores each, and as suggested by the documentation I set taskmanager.numberOfTaskSlots: 32
and parallelism.default: 128
.
Even if I increase these values, the average CPU utilization never reaches above 40%. Consequently, I achieve low performance as my resources are not fully utilized.
I also want to point out the fact that in some cases I have noticed better performance with lower parallelism levels (and CPU utilization).
What am I missing?