0

I have slurm 23.11.0-0rc1 installed (from source) on an Ubuntu 18.04 machine. Unfortunately, this node is part of a cluster whose other nodes have Ubuntu 22.04. Things were going well until one time the Ubuntu 18.04 machine is not responding to jobs submitted to it because of the old version of glibc. So, I tried to install libc6 version 2.35 from the source into a directory /opt/glibc, then tried ldconfi -n /opt/glibc/lib, and appended the new library path (i.e., /opt/glibc/lib) to/etc/ld.so.conf.d/libc.conf file, but the system crashed (the system is not responding correctly to any command). When I googled it, many sites say that glibc is an essential library and modification to it can harm the system (so, I guess what I did what a stupid move). Now, I can't edit the /etc/ld.so.conf.d/libc.conf to remove the appended path. I tried to scp the correct file to the correct location but also failed.

I wonder what I should do about it? Also, how to install the new libc6 (version 2.35) on Ubuntu 18.04 without breaking the system? I tried to use patchelf as indicated here, but it didn't work (at least for me). I wonder if I should instead downgrade the libc6 on the other nodes, but I'm not sure if this won't break the system also.

Regards

shambakey1
  • 37
  • 7
  • If you do acknowledge that "glibc is an essential library and modification to it can harm the system", why not first remove this 18.04 machine from the cluster? Besides, 18.04 reached end of life, so you should upgrade that machine to 22.04 and then put it back. – Lex Li Jul 11 '23 at 21:32
  • Because this machine hosts an M10 Tesla GPU. We could only support, AFAWK, this card on Ubuntu 18.04 (not higher). Previously, there was no problem with integrating this node into the cluster since they all hosted the same Slurm version, but I think making regular updates to the systems on different nodes resulted in updating the libc6 on the other nodes, which, I think, resulted in problems in job submission to the GPU node. I still wonder if downgrading the libcs6 on the other nodes (e.g., from 2.35 to 2.34) can solve the problem? or if there's a way to install newer libc6 on Ubuntu 18.04? – shambakey1 Jul 12 '23 at 08:20

1 Answers1

0

I wonder what I should do about it?

The only way to recover is to boot the system from recovery disk and restore damaged system GLIBC from it.

Also, how to install the new libc6 (version 2.35) on Ubuntu 18.04 without breaking the system?

See this answer.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362