2

I have a python program consisting of 5 processes outside of the main process. Now I'm looking to get an AWS server or something similar on which I can run the script. But how can I find out how many vCPU cores are used by the script/how many are needed? I have looked at:

import multiprocessing

multiprocessing.cpu_count()

But it seems that it just returns the CPU count that's on the system. I just need to know how many vCPU cores the script uses.

Thanks for your time.

EDIT:

Just for some more information. The Processes are running indefinitely.

Zercon
  • 105
  • 9
  • 1
    The script should use 1 per process * 6 processes = 6. Maybe an AWS server (EC2) does something funky, I don't know. You could deploy to an EC2 node and monitor with a tool like `cpustat`. – Michael Ruth Aug 18 '22 at 21:15
  • 3
    We can't answer that question. CPU usage is incredibly dynamic, and you're sharing those CPUs with hundreds of other processes. If 4 of your processes spend most of their time waiting for a response from a web server, then you might only be using 1 CPU. – Tim Roberts Aug 18 '22 at 21:24

5 Answers5

2

Answer to this post probably lies in the following question:

Multiprocessing : More processes than cpu.count

In short, you have probably hundreds of processes running, but that doesn't mean you will use hundreds of cores. It all depends on utilization, and the workload of the processes.

You can also get some additional info from the psutil module

import psutil

print(psutil.cpu_percent())
print(psutil.cpu_stats())
print(psutil.cpu_freq())

or using OS to receive current cpu usage in python:

import os
import psutil

l1, l2, l3 = psutil.getloadavg()
CPU_use = (l3/os.cpu_count()) * 100

print(CPU_use)
  • Credit: DelftStack

Edit

There might be some information for you in the following medium article. Maybe there are some tools for CPU usage too. https://medium.com/survata-engineering-blog/monitoring-memory-usage-of-a-running-python-program-49f027e3d1ba

Edit 2

A good guideline for how many processes to start depends on the amount of threads available. It's basically just Thread_Count + 1, this ensures your processor doesn't just 'sit around and wait', this however is best used when you are IO bound, think of waiting for files from disk. Once it waits, that process is locked, thus you have 8 others to take over. The one extra is redundancy, in case all 8 are locked, the one that's left can take over right away. You can however in- or decrease this if you see fit.

nigel239
  • 1,485
  • 1
  • 3
  • 23
  • While this is helpful, it does not provide me with the information to determine how many vCPU cores i would need. – Zercon Aug 18 '22 at 21:29
  • 1
    You would need to do some analytical tests, outside of the python environment. Best you could maybe do is check the CPU usage for the process ID `PID` of python, in a program like C, Golang, C++, etc etc. Then record that, and make an estimate. You would need to put extreme load onto your python application to do these tests. Keep in mind how much you will want to poll the process usage, polling this too much might impact the tests performed. – nigel239 Aug 18 '22 at 21:32
2

On Linux you can use the "top" command at the command line to monitor the real-time activity of all threads of a process id:

top -H -p <process id>
C. Pappy
  • 739
  • 4
  • 13
2

Your question uses some general terms and leaves much unspecified so answers must be general.

It is assumed you are managing the processes using either Process directly or ProcessPoolExecutor.

In some cases, vCPU is a logical processor but per the following link there are services offering configurations of fractional vCPUs such as those in shared environments...

What is vCPU in AWS

You mention/ask...

... Now I'm looking to get an AWS server or something similar on which I can run the script. ...

... But how can I find out how many vCPU cores are used by the script/how many are needed? ...

You state AWS or something like it. The answer would depend on what your subprocess do, and how much of a vCPU or factional vCPU each subprocess needs. Generally, a vCPU is analogous to a logical processor upon which a thread can execute. A fractional portion of a vCPU will be some limited usage (than some otherwise "full" or complete "usage") of a vCPU.

The meaning of one or more vCPUs (or fractional vCPUs thereto) to your subprocesses really depends on those subprocesses, what they do. If one subprocess is sitting waiting on I/O most of the time, you hardly need a dedicated vCPU for it.

I recommend starting with some minimal least expensive configuration and see how it works with your app's expected workload. If you are not happy, increase the configuration as needed.

If it helps...

I usually use subprocesses if I need simultaneous execution that avoids Python's GIL limitations by breaking things into subprocesses. I generally use a single active thread per subprocess, where any other threads in the same subprocess are usually at a wait, waiting for I/O or do not otherwise compete with the primary active thread of the subprocess. Of course, a subprocess could be dedicated to I/O if you want to separate such from other threads you place in other subprocesses.

Since we do not know your app's purpose, architecture and many other factors, it's hard to say more than the generalities above.

Ashley
  • 575
  • 5
  • 12
1

Your computer has hundreds if not thousands of processes running at any given point. How does it handle all of those if it only has 5 cores? The thing is, each core takes a process for a certain amount of time or until it has nothing left to do inside that process.

For example, if I create a script that calculates the square root of all numbers from 1 to say a billion, you will see that a single core will hit max usage, then a split second later another core hits max while the first drops to normal and so on until the calculation is done.

Or if the process waits for an I/O process, then the core has nothing to do, so it drops the process, and goes to another process, when the I/O operation is done, the core can pick the process back, and get back to work.

You can run your multiprocessing python code on a single core, or on 100 cores, you can't really do much about it. However, on windows, you can set affinity of a process, which gives the process access to certain cores only. So, when the processes start, you can go to each one and set the affinity to say core 1 or each one to a separate core. Not sure how you do that on Linux though.

In conclusion, if you want a short and direct answer, I think we can say as many cores as it has access to. If you give them one core or 200 cores, they will still work. However, performance may degrade if the processes are CPU intensive, so I recommend starting with one core on AWS, check performance, and upgrade if needed.

Mohamed Yasser
  • 641
  • 7
  • 17
  • Yes, you can use the cores. But OP wants to know how much of the cores he is utilizing. (And if it would warrant an upgrade) – nigel239 Aug 18 '22 at 22:00
  • 1
    He wants to know how many cores should he get on his AWS Server, all 5 processes can work on a single core; although that can affect the performance of other processes on the server. – Mohamed Yasser Aug 18 '22 at 22:14
  • 1
    Yes, but once he scales, or the processes are more intensive, you will not want that on a single core, and will want to spread the load. You can force all processes onto one core, but performance will be far worse if multiprocessing is properly implemented. It's also always good to know how much your processes utilize, just to prepare. He's also asking, how many cores. Beyond certain utilization, overall performance will drop, hence, `more core` – nigel239 Aug 18 '22 at 22:17
  • 2
    In the first case, you can upgrade to more cores, I agree with that. But that is once that happens. I don't really know how intensive the background processes are on AWS server, but I reckon they would be minimal otherwise they are just scamming people out of cores. Unless someone comes up with a way to know EXACTLY how much CPU each process uses (not the total CPU usage for ALL the processes) I recommend starting with one, checking the performance and upgrading if needed. – Mohamed Yasser Aug 18 '22 at 22:30
1

I'll try to do my own summary about "I just need to know how many vCPU cores the script uses".

There is no way to answer that properly other than running your app and monitoring its resource usage. Assuming your Python processes do not spawn subprocesses (which could even be multithreaded applications), all we can say is that your app won't utilize more than 6 cores (as per total number of processes). There's a ton of ways for program to under-utilize CPU cores, like waiting for I/O (disk or network) or interprocess synchronization (shared resources). So to get any kind of understanding of CPU utilization, you really need to measure the actual performance (e.g., with htop utility on Linux or macOS) and investigating the causes of underperforming (if any).

Hope it helps.

Nikolaj Š.
  • 1,457
  • 1
  • 10
  • 17