0

I've setup a github workflow to run my docker compose headless (node & postgres container) then run my jest tests. The issue is that about 70% of the time it's successful, database connections never been an issue, all secrets and configuration is working, all tests pass. Other times it will throw me a 137 exit code halfway through the tests and complete the tests 100% successful with a success code in the logs of the docker container.

docker-compose.actions.yaml:

name: ***
networks:
  ***-network:
    external: false
services:
  auth-postgres:
    container_name: auth-postgres
    deploy:
      resources:
        limits:
          cpus: "2"
          memory: 1G
        reservations:
          cpus: "1"
          memory: 512M
    env_file:
      - auth/.env.docker
    image: postgres:15-alpine
    networks:
      - ***-network
    ports:
      - 5432:5432
    healthcheck:
      test: sh -c pg_isready -d "$${POSTGRES_DB}" -U "$${POSTGRES_USER}"
      interval: 10s
      timeout: 60s
      retries: 5
  auth:
    container_name: auth
    deploy:
      resources:
        limits:
          cpus: "6"
          memory: 4G
        reservations:
          cpus: "2"
          memory: 2G
    env_file:
      - auth/.env.docker
    build:
      context: auth
      dockerfile: Dockerfile
    entrypoint:
      - ./tests-entrypoint.sh
    depends_on:
      auth-postgres:
        condition: service_healthy
    networks:
      - ***-network
    ports:
      - 1337:1337
    working_dir: /home/***/auth
    volumes:
      - /home/***/auth

Dockerfile:

FROM node:18

WORKDIR /home/***/auth
COPY *.sh .
COPY *.js .
COPY *.json . 
COPY src/ ./src
COPY prisma/ ./prisma

RUN chmod +x ./tests-entrypoint.sh
RUN chmod +x ./entrypoint.sh
RUN npm ci

tests.yaml:

name: Unit Tests

on: [pull_request]

jobs:
  unit-tests:
    runs-on: self-hosted
    env:
      # removed
    steps:
      - uses: actions/checkout@v3
      - name: Build docker compose
        run: printenv > auth/.env.docker && make dockerActions # dockerActions = docker compose up -d
      - name: Run tests
        run: docker exec $(docker ps --latest --quiet) /bin/bash -- ./tests-entrypoint.sh
      - name: Inspection
        if: always()
        run: docker inspect auth && docker inspect auth-postgres
      - name: Logs
        if: always()
        run: docker logs auth && docker logs auth-postgres

tests-entrypoint.sh:

#!/bin/bash

npm test

if [ $? -eq 0 ]
then
  echo "Tests job success."
  exit 0
else
  echo "Failure. " >&2
  exit 1
fi

docker info

termination and the error inside github actions

container's log showing exit 0

container's inspection

self runner's logs showing exit 137

Is there a limit to the amount of resources that I am allowed to use while using a self hosted runner? I run these tests in docker locally in development and have never had this error occur for me, even without specifying resource limits (doesn't go anywhere near the limits either). The confusing part to me is that even when I was running these on the free tier runner it would still work sometimes, and when it would fail the container will still make it to the end with a success code. But now it's running on my machine, and is not over consuming 12 cores and 32gb ram while watching my resources.

lucid
  • 1
  • 2
  • Did you look at this [thread](https://stackoverflow.com/questions/59296801/docker-compose-exit-code-is-137-when-there-is-no-oom-exception)? What about OOM? Did you its logs? Check check with `sudo dmesg -T | grep -i killed` and also check the syslog as well. – Azeem Apr 28 '23 at 07:46
  • Also, you have configured `resources` too. Did you try increase those values or test without them? – Azeem Apr 28 '23 at 07:47
  • Played around with resources values before making the post, it would speed things up - but still unstable and gets terminated by the runner. The docker container never exits with an OOM error, it always shows exit 0 and false for OOM killed. The runner exits it with code 137 – lucid Apr 28 '23 at 13:52
  • Right. Does your `docker compose up -d` command contain any other flags about aborting or exiting? – Azeem Apr 28 '23 at 15:52
  • No, my make command is `docker compose up -d` – lucid Apr 28 '23 at 16:39
  • Do you have a sleep or some kind of loop to check if all services are up and running? Did you observe what kind of tests usually fail in this scenario? – Azeem Apr 28 '23 at 16:48
  • 1
    I don't have any sleeps or loops running, api container doesn't restart after tests end. The tests never fail, they always fully succeed with an exit code of 0, github runner is crashing it prematurely and allowing the container to proceed. I made some progress purging all containers and cache between each workflow `docker system prune --all --force` (I hadn't been cleaning up containers). I'm reproducing the 137 error whenever I try to run actions without clearing cache first. Now I'm able to get my actions to run successfully for 2 runs max, then 137 errors (exit 0 - log in docker) – lucid Apr 28 '23 at 17:17

0 Answers0