26

What is the difference between alpine docker image and busybox docker image ?

When I check their dockfiles, alpine is like this (for Alpine v3.12 - 3.12.7)

FROM scratch
ADD alpine-minirootfs-3.12.7-x86_64.tar.gz /
CMD ["/bin/sh"]

busybox is like this

FROM scratch
ADD busybox.tar.xz /
CMD ["sh"]

But as https://alpinelinux.org/about/ says

Alpine Linux is built around musl libc and busybox.

So what is exactly the difference ?

I am also curious that many docker images, (nodejs/nginx/php just name a few) provide images based on alpine but not on busybox. Why is that ? What is use case for busybox image then ? I need to emphasize that I am not looking for an answer about why A is better than B or vise versa or software recommendation.

I have been experiencing intermittent DNS lookup failure for my alpine docker, as here musl-libc - Alpine's Greatest Weakness and here Does Alpine have known DNS issue within Kubernetes? said. That is one of reasons I asked my question.

PS, https://musl.libc.org/ says "musl is an implementation of the C standard library built on top of the Linux system call API" and https://en.wikipedia.org/wiki/Alpine_Linux mentioned

It previously used uClibc as its C standard library instead of the traditional GNU C Library (glibc) most commonly used. Although it is more lightweight, it does have the significant drawback of being binary incompatible with glibc. Thus, all software must be compiled for use with uClibc to work properly. As of 9 April 2014,[16] Alpine Linux switched to musl, which is partially binary compatible with glibc.

Qiulang
  • 10,295
  • 11
  • 80
  • 129
  • Which libc is the `busybox` image built against? We'd need to analyze `busybox.tar.xz` to know. – Charles Duffy May 18 '21 at 13:24
  • I wouldn't be surprised if it were statically linked, but even then, there's a huge difference in size between statically-linked-against-glibc and statically-linked-against-musl. Basically, how that `busybox.tar.xz` was built needs to come into the question for this to be answerable. – Charles Duffy May 18 '21 at 13:25
  • (Mind, I consider this question likely to be off-topic as it's "seeking recommendations" between two alternative pieces of software; moreover, an analysis of what's different between them is not narrowly scoped or specific and thus arguably "too broad", and moreover is subject to change as new versions are rolled out). – Charles Duffy May 18 '21 at 13:26
  • But I really didn't look for recommendations (for anything). I just want to know the use case of busybox docker. – Qiulang May 18 '21 at 13:28
  • ...if `alpine` provides musl as a shared library and `busybox` statically links musl, that means that `alpine` can result in a smaller image when you have other shared libraries added, but `busybox` will be smaller off the bat -- if that speculation were true, it would make `busybox` more efficient only when you're adding things like shell scripts that don't require more compiled binaries (at least, not compiled binaries that need a libc). – Charles Duffy May 18 '21 at 13:28
  • "When and why should I use software-X instead of software-Y?" is very much a recommendation request. – Charles Duffy May 18 '21 at 13:28
  • Anyhow, ignoring that, there's a bunch of investigation that needs to happen for this to be answerable in terms of how those tarballs are built; the alpine one's description goes into enough details, the busybox one currently doesn't. – Charles Duffy May 18 '21 at 13:29
  • OK I will think about how to reword that. Again, I really didn't look for recommendation. I am curious why so many image use alpine not busyboy as their bases. – Qiulang May 18 '21 at 13:30
  • Part of that is going to be marketing. Alpine is _described to potential users_ as something meant for them to use it as a base image for small single-purpose systems. There may not be any difference in actual suitability at all, but the difference in description and positioning influences choices. – Charles Duffy May 18 '21 at 13:33
  • (Market positioning also influences things like maintenance choices: If the goal is just to have a version of musl libc _that's good enough to run busybox_, then things like security fixes may be unimportant if they're fixes to parts of the libc that busybox doesn't use; whereas if alpine is shipping a dynamically compiled libc, its maintainers are positioning themselves to be responsible for keeping up-to-date even for changes that busybox doesn't care about). – Charles Duffy May 18 '21 at 13:37
  • BTW, personally, I consider both of these kind of awful (but then, I consider the Docker ecosystem as a whole kind of awful). The nixpkgs approach to building container images (or system images, or everything else) lets you specify _exactly_ what you want; tell Nix you want `pkgsMusl.busybox` and you get a busybox dynamically compiled against musl libc; tell it you want `pkgsStatic.busybox` and you get a busybox statically compiled against musl libc; whereas just `pkgs.busybox` is busybox compiled against glibc. And with `dockerTools` you can tell Nix to make a Docker image out of any of this. – Charles Duffy May 18 '21 at 14:27
  • And `pkgsStatic` and `pkgsMusl` [are just code](https://github.com/NixOS/nixpkgs/blob/8284fc30c84ea47e63209d1a892aca1dfcd6bdf3/pkgs/top-level/stage.nix#L212-L234) -- they don't do anything you couldn't write yourself in a few short lines, so if you wanted glibc-and-static, that's comparably trivial ("comparatively" because of how glibc does DNS by `dlopen()`ing a stub resolver, which is also why the busybox image is no longer glibc-and-static) – Charles Duffy May 18 '21 at 14:28
  • This is the first time I heard someone said the docker ecosystem is awful lol – Qiulang May 18 '21 at 14:36
  • If you have a few minutes, I'm happy to expand on that. – Charles Duffy May 18 '21 at 14:38
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/232555/discussion-between-charles-duffy-and-qiulang). – Charles Duffy May 18 '21 at 14:38

2 Answers2

25

The key difference between these is that older versions of the busybox image statically linked busybox against glibc (current versions dynamically link busybox against glibc due to use of libnss even in static configuration), whereas the alpine image dynamically links against musl libc.

Going into the weighting factors used to choose between these in detail would be off-topic here (software recommendation requests), but some key points:

Comparing glibc against musl libc, a few salient points (though there are certainly many other factors as well):

  • glibc is built for performance and portability over size (often adding special-case performance optimizations that take a large amount of code).
  • musl libc is built for correctness and size over performance (it's willing to be somewhat slower to have a smaller code size and to run in less RAM); and it's much more aggressive about having correct error reporting (instead of just exiting immediately) in the face of resource exhaustion.
  • glibc is more widely used, so bugs that manifest against its implementation tend to be caught more quickly. Often, when one is the first person to build a given piece of software against musl, one will encounter bugs (typically in that software, not in musl) or places where the maintainer explicitly chose to use GNU extensions instead of sticking to the libc standard.
  • glibc is licensed under LGPL terms; only software under GPL-compatible terms can be statically linked against it; whereas musl is under a MIT license, and usable with fewer restrictions.

Comparing the advantages of a static build against a dynamic build:

  • If your system image will only have a single binary executable (written in C or otherwise using a libc), a static build is always better, as it discards any parts of your libraries that aren't actually used by that one executable.
  • If your system image is intended to have more binaries added that are written in C, using dynamic linking will keep the overall size down, since it allows those binaries to use the libc that's already there.
  • If your system image is intended to have more binaries added in a language that doesn't use libc (this can be the case for Go and Rust, f/e), then you don't benefit from dynamic linking; you don't need the unused parts of libc there because you won't be using them anyhow.

Honestly, these two images don't between themselves cover the whole matrix space of possibilities; there are situations where neither of them is optimal. There would be value to having an image with only busybox that statically links against musl libc (if everything you're going to add is in a non-C language), or an image with busybox that dynamically links against glibc (if you're going to add more binaries that need libc and aren't compatible with musl).

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Thanks for the detailed answer. I guess the MIT license is maybe a key factor. – Qiulang May 18 '21 at 14:31
  • _shrug_. The LGPL (unlike the regular GPL) is friendly to dynamic linking, which is why glibc is widely used even for commercial software; most folks don't need to statically link their commercial tools. – Charles Duffy May 18 '21 at 14:32
  • https://wiki.musl-libc.org/ said "Some of musl’s major advantages over glibc and uClibc/uClibc-ng are its size, correctness, static linking support, and clean code" – Qiulang May 18 '21 at 14:35
  • 1
    Yes, all that is true. That said, glibc being "the standard" means there are some pretty powerful network effects in its favor; needing to port software to build against musl is work, whereas basically everything works with glibc out-of-the-box. – Charles Duffy May 18 '21 at 14:36
  • Hi I hit this question again and provided an answer but still with many questions in it. Can you take a look ? – Qiulang Jul 21 '22 at 03:36
3

When I first asked the question I was not sure about the use case of busybox docker image and my link about busybox dockerfile was not entirely correct. This was the correct dockerfile link and it explains many things. So busybox provides 3 different versions, build on glibc, musl, uclibc

busybox dockerfile

A more appropriate question is what is the difference between alpine image and busybox image build based on musl? I still don't know the answer, except that the alpine image is more actively maintained.

"Use Cases and Tips for Using the BusyBox Docker Official Image" was published Jul 14, 2022 (so quite new) and it said "Maintaining the BusyBox image has also been an ongoing priority at Docker."

I still hope to see someone may provide an answer about the use case of BusyBox image build on glibc or uclibc

--- update ---

As here discuss package manager for docker container running image busybox:uclibc "Anything based on Busybox doesn't have a package manager. It's a single binary with a bunch of symlinks into it, and the way to add software to it is to write C code and recompile." and here Package manager for Busybox also explained, busybox does NOT have a package manager that is probably the reason why most people use alpine.

As for the DNS failure, I experience randomly I find Why I Will Never Use Alpine Linux Ever Again explains it well

musl (by design) doesn't support DNS-over-TCP...The worst part is that this can manifest randomly, anytime when some external network change causes the resolution of some particular domain to require more than the 512 bytes available in a single UDP packet.

Finally, this DNS issue does not manifest in Docker container. It can only happen in Kubernetes...Kubernetes docs claim that DNS issues are relevant only for “Alpine version 3.3 or earlier”, but I encountered the above issue on Alpine 3.16, so goes figure.

Does Alpine resolve DNS properly? gave an exact account of the problem:

The TC bit is used when the DNS response the DNS server wants to send to the client is longer than the 512bytes available to it in a UDP packet ... This is a signal to the DNS resolver client that it needs to switch from a standard UDP DNS query and do a new TCP DNS query instead

Thankfully, alpine 3.18 it says it now fixes this problem: "musl libc 1.2.4 – now with TCP fallback in DNS resolver"

But the docker containers I use have not supported alpine 3.18 (as in 2023 July) so I will wait to see it works.

Qiulang
  • 10,295
  • 11
  • 80
  • 129
  • BTW, personally, I don't find the DNS argument compelling. I've done a lot of startups, and never run a site (when wearing an ops hat) where I didn't run my own local caching DNS server, and provided both TCP and UDP implementations as a matter of course. These days you've even got things like systemd-resolved that run a local DNS server _on each computer_ that forwards requests to external infrastructure. If you don't run your own DNS you can't do split-horizon, you end up exposing your internal hostnames to the world, so this is generally a situation I don't see why anyone would be in. – Charles Duffy Jul 01 '23 at 00:14
  • (there are also some security arguments: if you want to encrypt your upstream DNS traffic, you can depend on per-application support that may or may not exist, or you can just have a local proxy that's responsible for taking care of all of it; it's also not unheard of for DNS to be used for exfil or command-and-control by malware, so blocking anything but your proper corporate-IT-approved DNS service is defensible in security-sensitive contexts for that reason too) – Charles Duffy Jul 01 '23 at 00:19
  • So were you suggesting "run my own local caching DNS server" is quite necessary ? Because I have only limited knowledge of kubernetes I really hope it can just work without extra set up. Besides I did use dnsmasq but hit this problem https://stackoverflow.com/questions/70471243/what-is-the-proper-way-to-start-dnsmasq-in-my-docker-entrypoint. Again, that is just another exmaple that setting up kubernete right is a daunting task for me. I really hope it can just work. – Qiulang Jul 04 '23 at 10:17
  • @CharlesDuffy alpine 3.18 has added ["TCP fallback in DNS resolver"](https://www.alpinelinux.org/posts/Alpine-3.18.0-released.html) so I think my argument is valid. – Qiulang Jul 21 '23 at 02:23
  • Eh. Popular demand for something that only matters when people are Doing It Wrong just means very few people are Doing It Right. :) -- but do read this as coming from an old-fashioned UNIX neckbeard, which it is. I'm also deeply distrustful of Kubernetes as a whole, for that matter, and any other orchestration layer that permits too many things that happen out-of-view; it's easiest to do good work when everything is accessible for operators to diagnose, understand, reason about, and fix. Running your own DNS fits with that; makes for easier troubleshooting. – Charles Duffy Jul 21 '23 at 02:54
  • (if you can see unencrypted DNS lookups within your local network boundary -- obvs., don't want them unencrypted after they leave -- it makes working back to understand the rest of the traffic you captured that much easier; running your own DNS server thus both helps your site's performance, but also means you have visibility into the queries, ability to block requests associated from known malware from a single central place, &c) – Charles Duffy Jul 21 '23 at 02:59
  • I will agree with these words: “very few people are Doing It Right”. But the thing is many people don't have the knowledge and don't bother to learn either. Using me as an example, I am probably the one who knows Docker and Kubernetes best in my company and I am struggling to make it work. Other guys in my company don't even have the desire to learn. They always say docker/k8s stuff "intimidates" them. – Qiulang Jul 21 '23 at 09:15