0

I have a situation where it seems that bash's wildcard expansion is sometimes not working in my automated build (it is similar to this question, the whole thing is running inside a chroot created inside a docker container, so there could be many reasons why this is broken (broken libc, broken shell, etc.). I tried using strace, but the result does not help me to analyze the issue.

The first line of the working case shows the expanded file-name:

$ ls /tmp/
linux-image-4.9.124.deb
$ strace ls /tmp/linux*deb
execve("/bin/ls", ["ls", "/tmp/linux-image-4.9.124"...], [/* 23 vars */]) = 0
...

And the failing case shows that the * did not get expanded:

$ ls /tmp/
linux-image-4.9.124.deb
$ strace ls /tmp/linux*deb
execve("/bin/ls", ["ls", "/tmp/linux*deb"], [/* 23 vars */]) = 0
...

set -o shows noglob off in both cases

How can I debug this for instance with strace/gdb or any other tool?

Étienne
  • 4,773
  • 2
  • 33
  • 58
  • Does answers in [Test whether a glob has any matches in bash](https://stackoverflow.com/q/2937407/5291015) solve your problem? – Inian Apr 22 '20 at 08:36
  • If so, I'll just close this as a duplicate. If not, explain your problem clearly to make it distinguishable from the other – Inian Apr 22 '20 at 08:36
  • So you're suggesting "strace test -e /tmp/linux*deb" ? – Étienne Apr 22 '20 at 08:37
  • I don't really see how the linked question helps, can you be more specific? – Étienne Apr 22 '20 at 08:41
  • So if I get your intention right, you want to check to if the glob expansion is success or not? – Inian Apr 22 '20 at 08:44
  • Are you claiming the failing case, when `linux-image-4.9.124.deb` exists, but `strace` is showing the glob un-expanded? – Inian Apr 22 '20 at 08:51
  • @Inian correct, this is what my logs are showing. I know it is very strange, but it could be a broken libc or something like that. "compgen" from the linked answer is a bash built-in, so I also cannot start it as a gdb session to analyze the issue, as far as I understand. – Étienne Apr 22 '20 at 08:54
  • Can you give more information on the two cases ? For example, are they using the same `echo $BASH_VERSION` ? – Philippe Apr 22 '20 at 10:17
  • @Philippe it should be the same shell, because it is the same git commit ID which sometimes compiles fine and sometimes fails on a build server using the same docker container as build environment (however echo $BASH_VERSION prints nothing because this is running in a chroot). The command is running using python's subprocess.popen with the option "shell=True": "chroot /mychroot /bin/sh with STDIN ls /tmp/linux*deb" See https://github.com/Linutronix/elbe/blob/master/elbepack/shellhelper.py#L81 – Étienne Apr 22 '20 at 11:33
  • @Philippe I double-checked and "env" gives the same result in both cases. – Étienne Apr 22 '20 at 11:50
  • How did you setup the two chroot environments ? In exactly the same way ? – Philippe Apr 22 '20 at 12:38
  • Yes, exactly the same way. The chroot is setup as part of the build process, and then the ls command is sometimes working and sometimes not. If I had to guess, I would say that the issue is that the python call to "subprocess.Popen" which is using "shell=True" is sometimes calling the correct shell to do the wildcard expansion (the 32 bits shell from the chroot) and sometimes calling the wrong shell (the 64 bits shell from the host). However this is just a wild guess (similar to https://stackoverflow.com/questions/42347022/using-pythons-subprocess-call-or-os-system-inside-a-chroot-jail) – Étienne Apr 22 '20 at 12:41
  • Is it possible for you to paste how chroot was setup ? So that others can reproduce ? – Philippe Apr 22 '20 at 13:04
  • Not really, I am not allowed to, and unfortunately I didn't manage to create a minimal reproducible example which I could share. It is a project based on the elbe build-system, which itself uses debootstrap to create the 32 bits chroot environment (the environment inside the chroot is based on debian). It is failing 50% of the time on some machine, but for instance on my PC it is working 100% of the time with the same docker container, so it is really hard to reproduce. – Étienne Apr 22 '20 at 13:09
  • @Philippe thanks for your help debugging this. I actually found the issue and documented it there, if you are interested: https://unix.stackexchange.com/questions/528361/dash-not-expanding-glob-wildcards-in-chroot/582245#582245 At the end I ran "strace" on the python script which was doing the chroot call, in order to be able to debug this issue. – Étienne Apr 24 '20 at 15:02
  • @Étienne Thank you for letting me know ! Are you running strace against which process ? ls ? – Philippe Apr 24 '20 at 15:11
  • @Philippe strace -f -v script.py, where script.py is calling chroot using subprocess.POpen with shell=true and then providing the call to ls using python's subprocess.communicate function. – Étienne Apr 24 '20 at 15:13

1 Answers1

1

I wrote a minimal python script calling chroot the same way my build-system is calling it, and then I ran strace -f -v script.py

This allowed me to find out that the issue is a failing system call getdents, and after googling a bit, I found out that this is a glibc/kernel bug related to the fact that getdents returns a 64 bits value (for the ext4 system getdents can return very high values even if there are only a few files in the directory, because the value is a hash), but the caller expects a 32 bits value: https://bugzilla.kernel.org/show_bug.cgi?id=205957

See also https://unix.stackexchange.com/questions/528361/dash-not-expanding-glob-wildcards-in-chroot

Étienne
  • 4,773
  • 2
  • 33
  • 58
  • What makes the bug happens in one case but not the other ? – Philippe Apr 24 '20 at 16:18
  • Have you checked the size of file system containing /tmp on both cases ? – Philippe Apr 24 '20 at 16:25
  • @Philippe I think the bug happens only sometimes because a 64bits variable value can still be small enough to fit in 32 bits – Étienne Apr 24 '20 at 19:33
  • @Philippe getdents returns a hash for the ext4 filesystem (it is a hack due to historical reasons), the size of the filesystem is not the issue here. If the hash can be represented in 32 bits, then the wildcard expansion succeeds, and if the hash has a value which can not be represented by 32 bits, getdents return EOVERFLOW and the wildcard expansion fails. – Étienne Apr 24 '20 at 20:00