0

On both v15 and v16 of Cephadm I am able to successfully bootstrap a cluster with 3 nodes. What I have found is that adding more than 26 OSDs on a single host causes cephadm orch daemon add osd to hang forever and no crash. Each of my nodes has 60 disks that lsblk will report as /dev/sda though /dev/sdbh. It doesn't appear that addressing the disk id /dev/sdXX is the problem but rather the quantity of disks. I was able to re-build and add the /dev/sdXX disks first then again once I hit the 27th OSD, it hangs indefinitely. Resources do not appear to be an issue, unless this is an Ubuntu Docker limitation to the number of containers? This is easily reproduced in a lab as I have created several with identical results.

  • It sounds a bit like [this](https://tracker.ceph.com/issues/50526) and [that](https://tracker.ceph.com/issues/48292) but I'm not sure if it's really applicable since you report a limit of 26 OSDs. – eblock May 28 '21 at 06:52
  • Thanks eblock I did encounter those as I've been scouring the bug tracker. I also found references to increasing fs.aio-max-nr value and that didn't address the issue either. I created a tracker account so that I can submit the bug but waiting for approval before I can submit. – Dean Benson May 28 '21 at 12:41
  • Quincy will have a new ssh library. Can you replicate the same error with the current master branch? (As of December 2021) – Sebastian Wagner Dec 17 '21 at 11:49

1 Answers1

0

It's been a while since I posted this and came upon the fix late in the fall of 2021 by using an HWE kernel from Ubuntu. Once the HWE kernel was installed (at the time), this got me over the hump.