2

I am using Kubernetes in Google Cloud (GKE).

I have an application that is hoarding memory I need to take a process dump as indicated here. Kubernetes is going to kill the pod when it gets to the 512Mb of RAM.

So I connect to the pod

# kubectl exec -it stuff-7d8c5598ff-2kchk /bin/bash

And run:

# apt-get update && apt-get install procps && apt-get install gdb

Find the process I want:

root@stuff-7d8c5598ff-2kchk:/app# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  4.6  2.8 5318004 440268 ?      SLsl Oct11 532:18 dotnet stuff.Web.dll
root      114576  0.0  0.0  18212  3192 ?        Ss   17:23   0:00 /bin/bash
root      114583  0.0  0.0  36640  2844 ?        R+   17:23   0:00 ps aux

But when I try to dump...

root@stuff-7d8c5598ff-2kchk:/app# gcore 1
ptrace: Operation not permitted.
You can't do that without a process to debug.
The program is not being run.
gcore: failed to create core.1

I tried several solutions like these, that always ends in the same result:

root@stuff-7d8c5598ff-2kchk:/app# echo 0 > proc/sys/kernel/yama/ptrace_scope
bash: /proc/sys/kernel/yama/ptrace_scope: Read-only file system

I cannot find the way to connect to the pod and deal with this ptrace thing. I found that docker has a --privileged switch, but I cannot find anything similar for kubectl.

UPDATE I found how to enable PTRACE:

apiVersion: v1
kind: Pod
metadata:
  name: <your-pod>
spec:
  shareProcessNamespace: true
  containers:
  - name: containerB
    image: <your-debugger-image>
    securityContext:
      capabilities:
        add:
        - SYS_PTRACE

Get the process dump:

root@stuff-6cd8848797-klrwr:/app# gcore 1
[New LWP 9]
[New LWP 10]
[New LWP 13]
[New LWP 14]
[New LWP 15]
[New LWP 16]
[New LWP 17]
[New LWP 18]
[New LWP 19]
[New LWP 20]
[New LWP 22]
[New LWP 24]
[New LWP 25]
[New LWP 27]
[New LWP 74]
[New LWP 100]
[New LWP 753]
[New LWP 756]
[New LWP 765]
[New LWP 772]
[New LWP 814]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185     ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
warning: target file /proc/1/cmdline contained unexpected null characters
Saved corefile core.1

Funny thing, I cannot find lldb-3.6, so I install the lldb-3.8:

root@stuff-6cd8848797-klrwr:/app# apt-get update && apt-get install lldb-3
.6
Hit:1 http://security.debian.org/debian-security stretch/updates InRelease
Ign:2 http://cdn-fastly.deb.debian.org/debian stretch InRelease
Hit:3 http://cdn-fastly.deb.debian.org/debian stretch-updates InRelease
Hit:4 http://cdn-fastly.deb.debian.org/debian stretch Release
Reading package lists... Done
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'python-lldb-3.6' for regex 'lldb-3.6'
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Find SOS plugin:

root@stuff-6cd8848797-klrwr:/app# find /usr -name libsosplugin.so
/usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.5/libsosplugin.so

Run lldb...

root@stuff-6cd8848797-klrwr:/app# lldb `which dotnet` -c core.1
(lldb) target create "/usr/bin/dotnet" --core "core.1"

But it gets tuck forever, the prompt never gets to (lldb) ever again...

Vlad
  • 802
  • 1
  • 10
  • 23
  • If you have access to the host machine, you can use [nsenter](http://man7.org/linux/man-pages/man1/nsenter.1.html) to run the command from the host. I don't know how GKE works in that regard, though. – mw007 Oct 19 '18 at 21:51
  • @vlad I am still getting the "ptrace: Operation not permitted." error even after using the securityContext. Any guess, why? – Ajay Sainy Oct 11 '19 at 19:14

1 Answers1

0

I had similar issue. Try installing a correct version of LLDB. SOS plugin from specific dotnet version is linked to a specific version of LLDB. For example dotnet 2.0.5 is linked with LLDB 3.6, v.2.1.5 is linked with LLDB 3.9. Also this document might be helpful: Debugging CoreCLR

Note not all versions of LLDB are available for some OS. For example LLDB 3.6 is unavailable on Debian but available on Ubuntu.

  • Note that I do not get to try loading SOS, it hangs before that, lldb itself. I tried to load SOS in lldb first, but it also hangs both when attaching to a live process and also when loading a dump. – Vlad Nov 15 '18 at 09:51
  • @Vlad . it seams to be a bug in lldb : https://github.com/nodejs/llnode/issues/61 . Try using lldb 4.0. I was able to load dump with .net 2.1.6+ lldb 4.0 https://gist.github.com/segor/dd98f3de05b23529af561ec4ed1305f7 – Serghei Gorodetki Nov 19 '18 at 10:49