4

I just saw this 2012 video from LinuxConf.au about Erlang in production.

There's a part on the video where Bernard says no big Erlang projects use Hot Code Loading apart from Ericsson, because it's really hard to guarantee things will work. It's around minute 29.

Is that still true? Are there tools to help test a hot code load or or make it easier nowadays?

Juliano
  • 2,402
  • 1
  • 20
  • 22
  • 1
    **Relups**: insanely hard to get right unless you develop with this case in mind at the outset (and at least half of that is *developing* documentation and specs in parallel with executable code). **Hot code loading**: too useful to ignore if you do anything that deals with live user data (think "game/chat/business application/peer-[anything] services"). "No hot loading" vs "hot loading" for these sort of massively concurrent services is analogous in developer terms to "waiting for a huge C++ project to compile before being able to test a change" vs "testing of an idea in a Python shell". – zxq9 Jun 28 '15 at 13:36

2 Answers2

12

This is not true. Every Erlang user uses hot code loading to his advantage in one way or another -- whether it is for development, testing, troubleshooting, one-off fixes, or full scale deployments. This is one of the major Erlang advantages. Rather unique too.

For example, WhatsApp, one of the biggest Erlang users, relies on hot code loading for almost all code pushes.

I have personally worked with hot code loading in scenarios where each change was well understood and often performed by the same person who made the change. It works extremely well and good engineers don't have any problems doing this. Speaking of tools, loading modules one by one from Erlang shell using l(...). or all at once using l(). (see here) works just fine. Some prefer release-based tools like relx.

Others, like Ericsson, use enterprise-style deployments with hot code loading after rigorous testing of clear-cut releases and patches. The goal here is to upgrade without using spare capacity and special procedures for draining and shifting load. Operationally this may be simpler and more efficient than restarts, but testing can be more expensive.

Community
  • 1
  • 1
alavrik
  • 2,151
  • 12
  • 16
  • 1
    +1 on the "in development, testing, troubleshooting, one-off fixes" bit. These cases in particular are profoundly useful. While I don't have any production systems which regularly undergo hot updates, I also don't have any development or testing systems which don't undergo multiple hot code changes per day -- its just so immensely useful. Genuine relups: (almost) never; hot code loads: always and constantly. – zxq9 Jun 28 '15 at 13:29
2

It is difficult to know whether it is a feature widely or scarcely used. Nowadays there are plenty of Erlang systems out there. I can however think of reasons why and why not to use it, since I have been working with bot options for quite some time.

In favour of using it:

  • It is obviously quite useful during development to ensure a fast feedback cycle. I always develop with an open shell, and with functions to load code automatically as son as compile.
  • In the rare case you need to implement a monolithic application with high availability requirements, it is basically the only option

The main reason not to use it, as the presentation states: it is hard. Even if you manage to understand exactly how it works (which is not the hardest part).

It is not, in my opinion just a problem of tooling, but rather that you are getting a lot of intrinsic complications just by the fact that now your code is part of the mutable running state of the system. You basically end up having a long running system that changes behaviour, so you add those to the problems you already had:

  • You are no longer sure that restarting the system will not change behaviour on any fundamental way. So you will probably need to put extra care on making sure that whatever code you load, it is also written to disk.
  • Changing the way your modules work (i.e. loading new code) is very tricky unless a) you never break compatibility, b) you somehow figure out the order in which the modules should change or c) you assume the worst that can happen is a few crashes due to undefined functions, function or case clauses, etc, and hope for the best (the actual worst is when the new and old modules interact in unexpected ways while you haven't finished loading all of the new ones and the actually run some impossible logic).
  • You will almost certainly will end up killing some process running old code when loading new code at some point. Maybe your supervisors will help you, maybe not. In any case that could be very confusing and difficult to debug.
  • As the presentation also states, is very hard to test (if not impossible).

Etc.

Adding to all those, you are running a long living server with long living state, which is far from ideal.

So my advise is always that, if you could get away with a distributed application and rolling upgrades, you should do it. That option is much easier to handle, and in my experience, performs better overall.

Samuel Rivas
  • 625
  • 3
  • 15
  • To be fair, "running a long living server with long living state" is exactly the type of problems Erlang was originally designed for. Moreover, rolling upgrades become an easy option for some applications, exactly because L2 and L3 routers/switches/load-balancers stay live with nine nines availability and rarely taken out of service for maintentance. In telecom, in many cases there isn't much spare capacity and migrating millions of live connections may be impractical and/or much harder than doing live upgrade on such system. – alavrik Jul 02 '15 at 07:59
  • 1
    Yes, Erlang's hot code loading was designed for that, and does solve the problem in a very unique way which makes Erlang a powerful tool in those scenarios. Is hot code loading an advantage for most production Erlang environments today? I don't think so. Just because Erlang can do hot code loading doesn't mean you should do it. My whole point is that maintaining such systems is intrinsically very hard, so one should avoid going there unless absolutely needed. – Samuel Rivas Jul 02 '15 at 09:12