0

According to here: https://www.codeword.xyz/2015/09/02/three-ways-to-script-processes-in-parallel/

With Wait, I can do something like this:

#!/bin/sh

/usr/bin/my-process-1 --args1 &
/usr/bin/my-process-2 --args2 &
/usr/bin/my-process-3 --args3 &

wait
echo all processes complete

Let's say I have hundreds of processes, will each process be started concurrently in a separate thread?

What determines the limit of threads that can be started?

techguy2000
  • 4,861
  • 6
  • 32
  • 48
  • 2
    A thread is a construct for managing virtual execution paths within a single process, not something that can *contain* a process. – chepner Dec 20 '18 at 04:45
  • This creates full processes, not threads. I suspect that detail may not matter to the OP, but best to differentiate for anyone reading these later. See chepner's comment, and c.f. [this Q&A](https://stackoverflow.com/questions/200469/what-is-the-difference-between-a-process-and-a-thread) for more breakdown. – Paul Hodges Dec 20 '18 at 14:18
  • 1
    So the trivial answer is "yes, every process runs exactly one thread". – tripleee Dec 21 '18 at 07:00
  • You can check the upper limit on the number of processes you can run with the command `ulimit -u`. As the article you linked to suggests, you gain a lot of flexibility by using **GNU Parallel** where you can add `-j` to set how many jobs run concurrently. – Mark Setchell Dec 21 '18 at 09:12

1 Answers1

1

First, processes are not threads. A process contains one or more threads. It starts with one thread, and may create more if it so chooses. It's a similar idea, but threads within a process share the process's address space (each thread can access/share the program's variables). Processes each get a separate address space, and are unable to access memory outside that assigned to it by the OS.

Any program you run gets a process. When you run a script, the language interpreter--be it shell, Python, whatever--gets invoked to execute the script, and that's a process. The difference between a program and a process is that the process is the running instance. So if you have 3 terminals open running bash, you have 3 processes running the one program. Note this doesn't necessarily mean windows: my mail program can have several windows open, but it's all still done by one process.

Yes, you can start numerous concurrent processes. Limits are imposed by the OS. 32K is a common limit, but different flavors of Unix/Linux support different process counts. There's usually also a per-user process limit, unless you're root.

In practice, concurrent process count is also limited by available memory and CPU. If you have 4GB of RAM, and you've got a program where each process/instance takes up 500K, you could run about 6000 copies before you exhaust RAM (500K*6000 copies = 3GB, and the OS needs some for itself). Your system will rely on its swapfile at this point, but you're going to encounter thrashing if all these processes are trying to run. If you do this to your SSD, you will shorten its life.

And, unless you've got a supercomputer with hundreds or thousands of processors, more than a few concurrent, CPU-intensive processes is all that's practical. If you start 100 CPU-intensive ("CPU bound") processes on a 4-core machine, the OS will spread core time over all 100 using time slicing, so each process will run at 4 cores/100 processes = 1/25 the rate it would run had it a core to itself. You won't get more done by forking thousands of concurrent processes, unless you have the hardware to actually do the work.

The flipside of being CPU bound is being I/O bound---suppose you want to mirror a website, so you're going to try downloading all 1000 pages in parallel. It's not going to be any faster than a limited number of parallel connections each grabbing items sequentially, because only so many bits can flow through the network. Once you saturate the network, more concurrency won't make anything faster.

You can use ps to list your personal processes, or ps -ef or ps aux to view all processes. There are many: as I'm writing this, my system has 235, but most of them are idle: terminals I'm not using at the moment, networking support, audio support at-the-ready in case it's called on, the web browser I'm writing in, the compositor that updates the screen when asked to by the web browser. You can learn a lot about your OS by looking through this list, and looking up what various programs do/what services they provide. This is where you see your OS is not one big black box, but a collection of many programs/processes, each providing some limited functionality, but together provide most of the OS services.

Perette
  • 821
  • 8
  • 17
  • So normally 32K processes can be run concurrently? Is process equivalent of a command or a script? I was running multiples scripts with wait. When observing on my Mac book, I see 3 scripts appear to run concurrently based on console output, and then another 3 scripts run. Is that expected? – techguy2000 Dec 20 '18 at 17:01
  • What do you mean by "concurrently"? One process at a time gets the CPU and the rest are queued up by the scheduler. If you have multiple CPUs then that many processes can genuinely run concurrently. – tripleee Dec 21 '18 at 07:01
  • @techguy2000: How are you seeing another 3? `ps`? Note that when running shell scripts, you may see programs that are part of the script running. When running a script containing `/bin/sleep 15`, for example, `ps` will show both `sh` (the script interpreter) and `sleep` (the command that's part of the script when it's interpreting that portion of the script. – Perette Dec 21 '18 at 07:10
  • @tripleee: When there are more concurrent processes than cores (and I'm going to ignore threads for simplicity), then you're right, some of those will need to be queued. The OS will round-robin between them to create an illusion of concurrency, although it's really just [time-slicing](https://en.wikipedia.org/wiki/Preemption_(computing)#Time_slice). – Perette Dec 21 '18 at 07:17
  • @PeretteBarella time-slicing makes sense and i think that's what i saw, just with naked eyes watching the output from my terminal. I saw 3 "print statements" first, and the computers working on the 3 scripts, and after a second or so, another 3 "print statements" and then some more computer processing. – techguy2000 Dec 21 '18 at 16:04
  • @PeretteBarella So going back to my original question, let's say each script takes 2 seconds to finish, when I have 1000 of them running with wait, it doesn't make it will only take 2 seconds to finish all 1000 scripts, depends on how many core or processing power of the computer, it may take much more time than just a few seconds right? – techguy2000 Dec 21 '18 at 16:05
  • Correct. If each script takes 2 seconds on its own, and you run them in parallel on, say, a 4-core machine, I'll expect them to take approximately 2 seconds*1000 instances/4 cores=500 seconds. Note that each script starts when its `/usr/bin/my-process-1 --args1 &` line is interpreted in your main script. `wait` is just making sure they've all completed before the main script continues on---it doesn't make them run. The `wait` and `echo all processes complete` let *you* know they finished, but the scripts will run without them, just without notifying of completion. – Perette Dec 21 '18 at 20:38