What is difference between StatsD vs Riemann ? and which one performs better on large scale distributed systems? we have a distributed platform built on Java and we want to monitor application metrics and perhaps some alerts. We understand that instrumentation is not free so Ideally we are looking for a highly scalable application monitoring framework which can add the least instrumentation cost to our platform/apps and be able to do all sorts of aggregations and so on. I also understand we can build something that is a combination of both but I can't think of reasons why? as both seem to do aggregations and so on but I am unable to identify which one would be a better fit or why one performs better than the other. It will be a great help if someone can share their experiences on these tools in the industry.
2 Answers
I don't have hard numbers on statsd, but Github's Brubeck post suggests that they were losing about 40% of their statsd events at--I'm guessing these graphs are in seconds--25,000 events/sec. Their replacement for statsd, in C, is pushing 4.3 million events/sec. http://githubengineering.com/brubeck/
Riemann won't compete with that on a per-packet basis, but in batches of, say, 10-100 metrics/message, I've heard multiple production users report 10 million events/sec. Unlike statsd, Riemann will scale to all available cores--I've saturated both network interfaces and all 48 cores on my box in tests--but actual performance is gonna vary depending on contention and what you do with your streams. Could be much slower. All depends.
Compared to statsd, Riemann has a much richer event model and performs arbitrary computation. A small Riemann config can replicate of statsd's functionality--but Riemann really shines when you need multidimensional rollups, state transition detection, integration with all kinds of other storage and alerting services, flap suppression, flow control, etc etc etc.
The cost of that is working in a programming language--Clojure--that may be unfamiliar to your team, and having to reason more carefully about scope, state, and if you're writing your own streams, concurrency. Riemann also isn't as widely deployed, which could be a drawback in terms of library support and hiring staff.

- 61
- 1
-
Hi Kyle! Thanks a lot for your response. Riemann does seem to have bindings in other languages right (riemann.io/clients.html). so in my case there is a Java Client Library I wonder why do I need to work with Clojure? Does Riemann have all the aggregations as statsD per say? Finally, do you have any idea of how many nodes they used to scale 10million/sec ? – user1870400 May 11 '16 at 18:09
-
Riemann also seem to use ruby for dashboard. Now that is another thing developers needs to know and manage and plan for deployment. I would like to stay away from ruby as much as possible. – user1870400 May 11 '16 at 18:29
-
so we need to use closure in config files? is that how it works? where does riemann stores its data what if I want to store the aggregated data in a database like InfluxDB to do historical analysis later? – user1870400 May 11 '16 at 22:55
-
10million/sec is on a single node, yes, it has all the aggregations in statsd, yes, Clojure is the config language, no, Riemann doesn't store any data, and yes, Riemann talks to Influx and dozens of other services, and by the way, all of this is in the comprehensive documentation on http://riemann.io. :) – Kyle Kingsbury May 15 '16 at 16:51
-
everything sounds good but not convinced with Ruby Dashboard. – user1870400 May 17 '16 at 00:13
The best performer would be Brubeck which is a Statsd compatible (written in C) and therefore you can use the same Statsd client libraries to connect to it.
Brubeck is written in C, Statsd is written in Node.js. And as Github explained in their article, they consider Node.js a foreign technology and they gradually replaced any Node.js services they had. One of them was statsd due to performance issues.
The second best in performance would be Riemann (however, it needs its own client libraries). Statsd would be the slowest.

- 8,198
- 6
- 64
- 63