I'm experimenting nodejs worker threads usage. I'm experiencing weird latency elapsed times.
- I have a
main.js
that spawn a rapid sequence of worker threads. - Each
worker.js
exceute a cpu-boud computation (generate prime numbers);
BTW, the
generatePrimes()
javascipt function is is just a demo example of cpu-bound calculation. In my real case, the worker thread is a nodejs program that bind an c++ library (doing a speech recognition, during half a second elapsed with 100% CPU).
my PC laptop: Ubuntu 20.04.2 LTS desktop environment, has 8 cores:
$ inxi -C -M Machine: Type: Laptop System: HP product: HP Laptop 17-by1xxx v: Type1ProductConfigId serial: <superuser/root required> Mobo: HP model: 8531 v: 17.16 serial: <superuser/root required> UEFI: Insyde v: F.32 date: 12/14/2018 CPU: Topology: Quad Core model: Intel Core i7-8565U bits: 64 type: MT MCP L2 cache: 8192 KiB Speed: 700 MHz min/max: 400/4600 MHz Core speeds (MHz): 1: 700 2: 700 3: 700 4: 700 5: 700 6: 700 7: 700 8: 700
$ echo "CPU threads: $(grep -c processor /proc/cpuinfo)" CPU threads: 8
I experienced that computation has a total elapsed of ~8 seconds, when
worker.js
run independently (single thread) calling function:generatePrimes(2, 1e7)
QUESTION
When I spawn multiple threads, e.g. 6 threads, almost in parallel (see code here below), I was expected to have ~8 seconds again (maybe with a small overhead), independently from the number of spawned threads (don't they run in parallel, having enough cpu cores?). Instead I got a total elapsed more big of foreseen ~8 seconds. I summarize more than ~20 seconds ?! Why?
Here below source codes and some elapsed measurements using time/pidstat:
main.js
// main.js
const { Worker } = require('worker_threads')
function runThread(workerData) {
return new Promise((resolve, reject) => {
const worker = new Worker('./worker.js', { workerData })
worker.on('message', resolve)
worker.on('error', reject)
worker.on('exit', (code) => {
if (code !== 0)
reject(new Error(`Worker stopped with exit code ${code}`))
})
})
}
async function main() {
const numThreads = + process.argv[2]
if ( !numThreads || numThreads < 1 ) {
console.error(`usage: ${process.argv[1]} number_of_threads`)
process.exit()
}
const min = 2
const max = 1e7
//
// run multiple threads, in "parallel":
//
// It simulates a rapid spawn ("parallel") of a specific number of thread computation.
// The main thread run numThreads times the same worker thread.
//
// Data results of each thread elaboration is just "done"
//
for (let i = 0; i < numeThreads; i++ )
setImmediate( async () => { console.log( await runThread({min, max}) ) } )
}
if (require.main === module)
main()
module.exports = { runThread }
worker.js
// worker.js
const { threadId, workerData, parentPort } = require('worker_threads')
const { generatePrimes } = require('./generatePrimes')
// take parameters from main/parente thread
const { min, max } = workerData
// synchronous long-running CPU-bound computation
const primes = generatePrimes(min, max)
// communicate result to main thread;
// to avoid any suspect that elapsed times depend on a large amount of data exchange (the primes array in this case),
// the returned data is just a short string.
parentPort.postMessage( `Done. Thread id: ${threadId}` )
generatePrimes.js
// generatePrimes.js
// long running / CPU-bound calculation
function generatePrimes(start, range) {
const primes = []
let isPrime = true
let end = start + range
for (let i = start; i < end; i++) {
for (let j = start; j < Math.sqrt(end); j++) {
if (i !== j && i%j === 0) {
isPrime = false
break
}
}
if (isPrime) {
primes.push(i)
}
isPrime = true
}
return primes
}
function main() {
const min = 2
const max = 1e7
console.log( generatePrimes(min, max) )
}
if (require.main === module)
main()
module.exports = { generatePrimes }
TESTS
- TEST 1: WITHOUT WORKER THREADS -> elapsed: ~8 seconds
- TEST 2: SPAWN NR. 1 THREAD -> elapsed: ~8 seconds
- TEST 3: SPAWN NR. 6 THREADS -> elapsed: ~21 seconds
TEST 1: WITHOUT WORKER THREADS
generatePrimes.js
standalone -> elapsed: ~8 seconds
$ /usr/bin/time -f "%E" pidstat 1 -u -e node generatePrimes
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx) 22/04/2021 _x86_64_ (8 CPU)
09:19:05 UID PID %usr %system %guest %wait %CPU CPU Command
09:19:06 1000 247776 98,02 0,00 0,00 0,00 98,02 5 node
09:19:07 1000 247776 100,00 0,00 0,00 0,00 100,00 5 node
09:19:08 1000 247776 100,00 0,00 0,00 0,00 100,00 5 node
09:19:09 1000 247776 100,00 0,00 0,00 0,00 100,00 5 node
09:19:10 1000 247776 100,00 0,00 0,00 0,00 100,00 5 node
09:19:11 1000 247776 100,00 0,00 0,00 0,00 100,00 5 node
09:19:12 1000 247776 100,00 0,00 0,00 0,00 100,00 5 node
09:19:13 1000 247776 100,00 0,00 0,00 0,00 100,00 5 node
[
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37,
41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89,
97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151,
157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223,
227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281,
283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359,
367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433,
439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503,
509, 521, 523, 541,
... 664479 more items
]
Average: 1000 247776 99,75 0,00 0,00 0,00 99,75 - node
0:08.60
TEST 2: SPAWN NR. 1 THREAD
main.js
spawn nr. 1 thread -> elapsed: ~8 seconds (again)
$ /usr/bin/time -f "%E" pidstat 1 -u -e node main 1
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx) 22/04/2021 _x86_64_ (8 CPU)
your machine has 8 cores.
09:21:01 UID PID %usr %system %guest %wait %CPU CPU Command
09:21:02 1000 247867 95,00 2,00 0,00 0,00 97,00 3 node
09:21:03 1000 247867 100,00 0,00 0,00 0,00 100,00 3 node
09:21:04 1000 247867 100,00 0,00 0,00 0,00 100,00 3 node
09:21:05 1000 247867 100,00 0,00 0,00 0,00 100,00 3 node
09:21:06 1000 247867 100,00 0,00 0,00 0,00 100,00 3 node
09:21:07 1000 247867 100,00 0,00 0,00 0,00 100,00 3 node
09:21:08 1000 247867 100,00 0,00 0,00 0,00 100,00 3 node
09:21:09 1000 247867 100,00 1,00 0,00 0,00 101,00 3 node
Done. Thread id: 1
Average: 1000 247867 99,38 0,38 0,00 0,00 99,75 - node
0:08.50
TEST 3: SPAWN NR. 6 THREADS
multiple (6) threads. -> elapsed: ~21 seconds (again)
$ /usr/bin/time -f "%E" pidstat 1 -u -e node main 6
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx) 22/04/2021 _x86_64_ (8 CPU)
your machine has 8 cores.
09:23:38 UID PID %usr %system %guest %wait %CPU CPU Command
09:23:39 1000 247946 554,00 1,00 0,00 0,00 555,00 0 node
09:23:40 1000 247946 599,00 1,00 0,00 0,00 600,00 0 node
09:23:41 1000 247946 600,00 1,00 0,00 0,00 601,00 0 node
09:23:42 1000 247946 599,00 0,00 0,00 0,00 599,00 0 node
09:23:43 1000 247946 599,00 1,00 0,00 0,00 600,00 0 node
09:23:44 1000 247946 599,00 0,00 0,00 0,00 599,00 0 node
09:23:45 1000 247946 600,00 0,00 0,00 0,00 600,00 0 node
09:23:46 1000 247946 599,00 2,00 0,00 0,00 601,00 0 node
09:23:47 1000 247946 599,00 0,00 0,00 0,00 599,00 0 node
09:23:48 1000 247946 599,00 0,00 0,00 0,00 599,00 0 node
09:23:49 1000 247946 600,00 1,00 0,00 0,00 601,00 0 node
09:23:50 1000 247946 598,00 1,00 0,00 0,00 599,00 0 node
09:23:51 1000 247946 599,00 2,00 0,00 0,00 601,00 0 node
Done. Thread id: 1
Done. Thread id: 4
09:23:52 1000 247946 430,00 0,00 0,00 0,00 430,00 0 node
09:23:53 1000 247946 398,00 0,00 0,00 0,00 398,00 0 node
09:23:54 1000 247946 399,00 1,00 0,00 0,00 400,00 0 node
09:23:55 1000 247946 398,00 0,00 0,00 0,00 398,00 0 node
09:23:56 1000 247946 399,00 0,00 0,00 0,00 399,00 0 node
09:23:57 1000 247946 396,00 3,00 0,00 0,00 399,00 0 node
09:23:58 1000 247946 399,00 0,00 0,00 0,00 399,00 0 node
Done. Thread id: 5
Done. Thread id: 6
09:23:59 1000 247946 399,00 1,00 0,00 0,00 400,00 7 node
Done. Thread id: 2
Done. Thread id: 3
Average: 1000 247946 522,00 0,71 0,00 0,00 522,71 - node
0:21.05
Why I got ~20 seconds instead of expected ~8 seconds? Where I'm wrong?
UPDATE
I separated cpu-bound function generatePrimes in a separated module, just for clarity.
I added more elapsed tests, incrementing number of threads from 1 to 9. Tests show that elapsed time increase with the number of spawned threads. Thats makes no sense for me :(
$ /usr/bin/time -f "%E" node main 1
your machine has 8 cores.
Done. Thread id: 1
0:08.86
$ /usr/bin/time -f "%E" node main 2
your machine has 8 cores.
Done. Thread id: 2
Done. Thread id: 1
0:13.96
$ /usr/bin/time -f "%E" node main 3
your machine has 8 cores.
Done. Thread id: 2
Done. Thread id: 1
Done. Thread id: 3
0:16.71
$ /usr/bin/time -f "%E" node main 4
your machine has 8 cores.
Done. Thread id: 3
Done. Thread id: 2
Done. Thread id: 4
Done. Thread id: 1
0:21.87
$ /usr/bin/time -f "%E" node main 5
your machine has 8 cores.
Done. Thread id: 3
Done. Thread id: 2
Done. Thread id: 5
Done. Thread id: 1
Done. Thread id: 4
0:22.20
$ /usr/bin/time -f "%E" node main 6
your machine has 8 cores.
Done. Thread id: 3
Done. Thread id: 4
Done. Thread id: 6
Done. Thread id: 2
Done. Thread id: 5
Done. Thread id: 1
0:23.74
$ /usr/bin/time -f "%E" node main 7
your machine has 8 cores.
Done. Thread id: 3
Done. Thread id: 4
Done. Thread id: 7
Done. Thread id: 2
Done. Thread id: 5
Done. Thread id: 1
Done. Thread id: 6
0:32.00
$ /usr/bin/time -f "%E" node main 8
your machine has 8 cores.
Done. Thread id: 6
Done. Thread id: 3
Done. Thread id: 2
Done. Thread id: 5
Done. Thread id: 1
Done. Thread id: 8
Done. Thread id: 7
Done. Thread id: 4
0:35.92
$ /usr/bin/time -f "%E" node main 9
your machine has 8 cores.
warning: number of requested threads (9) is higher than number of available cores (8)
Done. Thread id: 8
Done. Thread id: 4
Done. Thread id: 6
Done. Thread id: 9
Done. Thread id: 2
Done. Thread id: 3
Done. Thread id: 7
Done. Thread id: 5
Done. Thread id: 1
0:40.27
BTW, related question: Why program execution time differs running the same program multiple times?