1

I am running a single-instance worker on AWS Beanstalk. It is a single-container Docker that runs some processes once every business day. Mostly, the processes sync a large number of small files from S3 and analyze those.

The setup runs fine for about a week, and then CPU load starts growing linearly in time, as in this screenshot.

AWS Elastic Beanstalk CPU load chart

The CPU load stays at a considerable level, slowing down my scheduled processes. At the same time, my top-resource tracking running inside the container (privileged Docker mode to enable it):

echo "%CPU %MEM ARGS $(date)" && ps -e -o pcpu,pmem,args --sort=pcpu | cut -d" " -f1-5 | tail

shows nearly no CPU load (which changes only during the time that my daily process runs, seemingly accurately reflecting system load at those times).

What am I missing here in terms of the origin of this "background" system load? Wondering if anybody seen some similar behavior, and/or could suggest additional diagnostics from inside the running container.

So far I have been re-starting the setup every week to remove the "background" load, but that is sub-optimal since the first run after each restart has to collect over 1 million small files from S3 (while subsequent daily runs add only a few thousand files per day).

Community
  • 1
  • 1
Pavel
  • 151
  • 9

1 Answers1

0

The profile is a bit odd. Especially that it is a linear growth. Almost like something is accumulating and taking progressively longer to process. I don't have enough information to point at a specific issue. A few things that you could check:

  • Are you collecting files anywhere, whether intentionally or in a cache or transfer folder? It could be that the system is running background processes (AV, index, defrag, dedupe, etc) and the "large number of small files" are accumulating to become something that needs to be paged or handled inefficiently.

  • Does any part of your process use a weekly naming convention or house keeping process. Might you be getting conflicts, or accumulating work load as the week rolls over. i.e. the 2nd week is actually processing both the 1st & 2nd week data, but never completing so that the next day it is progressively worse. I saw something similar where an inappropriate bubble sort process was not completing (never reached the completion condition due to the slow but steady inflow of data causing it to constantly reset) and the demand by the process got progressively higher as the array got larger.

  • Do you have some logging on a weekly rollover cycle ?

  • Are there any other key performance metrics following the trend ? (network, disk IO, memory, paging, etc)

  • Do consider if it is a false positive. if it is high CPU there should be other metrics mirroring the CPU behaviour, cache use, disk IO, S3 transfer statistics/logging.

RL

Polymath
  • 125
  • 7
  • The odd thing is that I do not run any weekly processes, and network in/out stays near-zero while CPU load grows. The files synced from S3 stay on the SSD drive of the docker host instance. Is there some other set of commands I could run in a shell script to diagnose where the CPU load comes from? Somehow a process that eats up system resources needs to be identified... – Pavel Feb 26 '17 at 07:19
  • I will talk to some NIX friends and get back to you after work. RL – Polymath Feb 26 '17 at 18:21
  • The best suggestions were 'top', and get a trial licence of a more sophisticated monitoring tool. top is good, but I am not sure you will get too much more. RL – Polymath Feb 27 '17 at 08:39
  • My understanding is that top would provide essentially the [same info as ps](http://unix.stackexchange.com/questions/62176/what-is-the-difference-between-ps-and-top-command). The problem is that the "background" CPU load is not reflected in system use commands output. Thank you @Polymath for summarizing the possibilities, I'll explore other metrics mirroring the CPU pattern. – Pavel Feb 28 '17 at 06:14