1

I'm running Python scripts as child processes, spawned using Nodejs.

When running locally, or locally using Docker / Kubernetes installation, it works as expected and completes all functions in the script. When running the container in Kubernetes Azure, the script silently stops / fails at just under 1 hour, without any exceptions or errors logged.

Memory & CPU usage stays below 30% max, container as a whole doesn't fail. When running ps -fA | grep python I can see the script running after it has been spawned. Script doesn't show anymore after it fails / stops silently. The 'exit' and 'close' events within Nodejs for the spawned processes do not fire.

Any advice on how to troubleshoot would be much appreciated.

EDIT: Nodejs spawn

import {/* inject, */ BindingScope, injectable} from '@loopback/core';

const path = require('path');

const spawn = require('child_process').spawn;

@injectable({scope: BindingScope.TRANSIENT})
export class PythonService {
  constructor() {} 
  stopPython(valuationId) {}

  executePython(id: string) {
    const filepath = path.resolve(process.env.PY_PATH);

    const ls = spawn('python', [filepath, id]);

    ls.stdout.on('data', function (data) {
      console.log('stdout: ' + data.toString());
    });

    ls.stderr.on('data', function (data) {
      console.log('stderr: ' + data.toString());
    });

    ls.error.on('error', function (data) {
      console.log('error: ' + data.toString());
    });

    ls.on('exit', function (code) {
      console.log('child process exited with code ' + code.toString());
    });

    ls.on('close', code => {
      console.log(`child process exited with code ${code}`);
    });
  }
}

EDIT: Dockerfile

# Pull base image
FROM python:3.7-slim

# Set installation environment variables
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV NODE_VERSION=12.20.0

# Install NVM for later use to install Node and NPM
RUN apt-get update && apt install -y curl
RUN curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
ENV NVM_DIR=/root/.nvm
RUN . "$NVM_DIR/nvm.sh" && nvm install ${NODE_VERSION}
RUN . "$NVM_DIR/nvm.sh" && nvm use v${NODE_VERSION}
RUN . "$NVM_DIR/nvm.sh" && nvm alias default v${NODE_VERSION}
ENV PATH="/root/.nvm/versions/node/v${NODE_VERSION}/bin/:${PATH}"

# Create app directory (with user `node`)
RUN mkdir -p /home/node/app

# Set work directory
WORKDIR /home/node/app

# Install python dependencies
COPY  requirements.txt /home/node/app/
RUN pip install -r requirements.txt
RUN pip install swifter

# Install node app dependencies
# A wildcard is used to ensure both package.json AND package-lock.json are copied
# where available (npm@5+)
COPY  package*.json ./
RUN npm install

# Bundle app source code
COPY . .

# Build node app
RUN  npm run build

# Expose ports
EXPOSE ${DB_PORT}
EXPOSE ${API_PORT}
EXPOSE ${SOCKET_PORT}

CMD [ "node", "." ]

Python v 3.7.11 Nodejs v 12.20

chS
  • 80
  • 1
  • 9
  • Can you show us what the script is that you're running and how? – C.Nivs Feb 10 '22 at 02:13
  • The script,the image,how you spawn the script? Print some log in your script? – caimaoy Feb 10 '22 at 02:23
  • @C.Nivs unfortunately it's a client's Py script, but I edited to add the Nodejs spawn code. Stdout, Stderr and error events trigger and print successfully, exit and close do not fire. – chS Feb 10 '22 at 07:24
  • @caimaoy It's a client's Py script unfortunately so I can't post it, but I've edited to add Nodejs spawn and image build info. This only happens on larger datasets though, +- 1mil rows of data. All console logs print normally up to the point where the process is killed, no errors / exceptions / info logged at the point where the process stops. – chS Feb 10 '22 at 07:31
  • So is docker/kubernetes killing it? Are either of them saying that there's an OOM issue? – C.Nivs Feb 10 '22 at 15:21
  • @C.Nivs at the moment, no idea. Digging through syslogs on the container to find anything that can help me troubleshoot. Nothing is logged via the Nodejs or Py exception handling & logging at the point that the script stops – chS Feb 10 '22 at 16:23

1 Answers1

0

Unix was killing the Python processes due to high memory usage, I was able to find OOM errors in the system logs by using ssh into the pod, then using dmesg for the kill logs and ps aux --sort -pmem to see the memory usage in the pod.

Reason for OOM was that the default memory allocated to Nodejs was considerably higher than the normal 2GB limit, which decreased the available memory for Python. Decreasing Nodejs memory allocation or removing exlusive Nodejs memory allocation solves the issue.

chS
  • 80
  • 1
  • 9