I don't understand why all forked children do not start at exactly the same time using Parallel::ForkManager in Perl

Question

I am making 2 forks.

I have 2 cases:

I just note down time in each fork
I note down time in each fork and then perform some steps like requesting a website.

As per my understanding, in both cases difference in start time of both children should be in same range. But in both cases, there is a lot of difference.

case1 code:

use Parallel::ForkManager;
use WWW::Mechanize;
use LWP::UserAgent;
use Time::HiRes qw/gettimeofday/;
use Time::Format qw/%time/;
use POSIX qw( strftime );
use Time::HiRes qw( gettimeofday );

$count = 2;
$pm    = new Parallel::ForkManager( $count );

for ( 1 .. $count ) {

    print "$_ ";

    my ( $secs, $microsecs ) = gettimeofday();
    print
        strftime( "%H:%M:%S", localtime( $secs ) ) .
        sprintf(".%04d", $microsecs / 10 );

    print "\n";

    $pm->finish;
}

$pm->wait_all_children;    ## wait for the child processes

output:

1 20:53:25.41494
2 20:53:25.65602

So, basically i want to do some mechanize operations using Perl but i need all children to start their execution at same time, which is not the case here. I need almost 1000 children to start execution at same time. Please improve my code, or provide a better way to implement this.

so, you mean load testing is not possible using perl? I just want to achieve like 1000req/sec. Is there any other module which is should use? — Saurabh Shrivastava, Jul 30 '15 at 15:57
Of course, load testing is possible in Perl. What is not possible is what you want: "I need almost 1000 children to start execution at ***same time***." — Sinan Ünür, Jul 30 '15 at 15:58
Also, load testing with Mech seems counter-productive to me. It is not going to stress the target web site to the fullest extent possible. When I did this, whether on purpose, or by accident, I preferred [using a browser](http://perltricks.com/article/139/2014/12/11/Automated-Internet-Explorer-screenshots-using-Win32--OLE) that loaded all required assets. — Sinan Ünür, Jul 30 '15 at 16:01
I have a webpage with form with fields and submitting form will hit an api. All i want is to make 1000 hits/sec to my api. Can this be done in perl. If not 1000, how much could be achieved. — Saurabh Shrivastava, Jul 30 '15 at 16:03
You can achieve as many hits as you want. It is a function of your method, and other constraints. — Sinan Ünür, Jul 30 '15 at 16:07
Can you please direct me to a starting point. Should i use some other module or method or what? — Saurabh Shrivastava, Jul 30 '15 at 16:08
Does [ab](https://httpd.apache.org/docs/2.4/programs/ab.html) not work for you? See what you can achieve with that on your platform. You will not be able to do better than that with Perl (and, possibly, significantly worse). However, with the right OS, hardware, and program, I don't see why you shouldn't be able to achieve 1,000 requests per second. — Sinan Ünür, Jul 30 '15 at 16:15
Your scaling of the microseconds is wrong. You should either `printf "%06", $microseconds` or `printf "%04", $microseconds / 100` — Borodin, Jul 30 '15 at 16:48
***No two processor events ever happen simultaneously***. It is an illusion created by the operating system that switches processor time rapidly between all the processes that are running. Unless you have a multi-core processor and can control what they all do, and even then you probably have no more than eight cores, so you will have a minimum of 125 processes sharing one core, and you have to allow for the operating system and other processes using up some of the time as well. — Borodin, Jul 30 '15 at 16:54
Completely different approach: use someone's service that is built for this. https://loader.io/ — simbabque, Jul 30 '15 at 17:11

score 5 · Answer 1 · edited May 23 '17 at 12:29

5

You don't create any child processes!!! Add the following at the start of your loop:

$pm->start and next;

It seems your primary concern is now throughput (1000 req/s), not when the requests start. In this situation, you can completely eliminate the time it takes to start a new worker by creating them in advance and reusing them. This is called the worker pool model, and a simple example can be found here. (The example uses threads, but the same model can be used with processes too, if that's preferable.)

Now, even if you eliminate the time it takes to startup the workers by reusing the workers, it doesn't leave you with much time to actually construct the requests and handle the responses. If spread perfectly across 8 cores, you only get 8 ms per request.

1000 req/s
= 1000/8 req/s on each core
= 125 req/s on each core
= 8 ms/req

That's not a lot. You may need to optimize your code. I'd dump LWP in favour of Net::Curl::Multi. In fact, if you use Net::Curl::Multi, all the requests should be made by one thread (the main one?), though you might still want to prepare the requests and handle the responses in worker threads.

edited May 23 '17 at 12:29

Community

1
1

answered Jul 30 '15 at 15:48

ikegami

367,544
15
269
518

@Sinan Ünür, I think that's just poor wording on his part. He's asking why adding to the workload of the first child delays the start of the second child. This fixes that problem. The children will now start within a millisecond of each other, no matter what work the children perform. – ikegami Jul 30 '15 at 15:55
Sorry, i forgot to add it. Now in both cases, difference in 2 childs start time is about 20ms, which is still big if i want 1000 req/sec. Please suggest some other module or better way. – Saurabh Shrivastava Jul 30 '15 at 15:58
@ikegami.. you mentioned that children will start within a millisecond of each other, this is what i want to achieve, but the difference is about 25ms. – Saurabh Shrivastava Jul 30 '15 at 16:01
1

I was using the numbers you gave (0.14ms), but I guess that doesn't actually perform the `fork`, which isn't exactly a cheap operation. It takes 1ms on my slow shared web host, so the first thing you should do is to get a machine that's faster than a Commodore 64. Then, you'll need to switch from Perl to C and reuse workers instead of starting a new one for each request. – ikegami Jul 30 '15 at 16:05
ok, so any other method/module to serve my purpose. I have a webpage with form with fields and submitting form will hit an api. All i want is to make 1000 hits/sec to my api – Saurabh Shrivastava Jul 30 '15 at 16:11
1

Since you only care about throughput now, all you need to do is to eliminate worker startup time by reusing the workers instead of continually creating new ones for each request. You might still have to switch to C (or something) to get that kind of throughput (depending on how you build the requests and what kind of work you do with the response), even if it's perfectly spread across 8 cores. (1000/8 = 125 req/s = 8ms each) – ikegami Jul 30 '15 at 17:01
Updated my answer, mostly to incorporate the previous comment. – ikegami Jul 30 '15 at 17:30

Sinan Ünür · Answer 2 · 2015-07-30T17:26:03.773

I am using windows.

You should have stated this as the first thing in your question.

On Windows, Perl's fork is actually implemented using Windows threads. Many of the features of real *nix forks do not apply. While Parallel::ForkManager can still be useful for a variety of tasks on Windows, do not expect the kind of performance you need for this particular task.

You may also be disappointed with ab on Windows (although, that strongly depends on your hardware and how ab was compiled etc). It would still perform better than Perl + FM + Mech, but I do not expect it to be able to reach what it could with BSD or Linux on the same hardware.

For this specific purpose, you may be better off using a non-Windows machine, unless you want to venture into Windows-specific network programming.

Also, curl builds cleanly on Windows 8.1 using Visual Studio 2013 and 2015 Community Edition tools, and, if that does not work, binaries are available.

But, the path of least resistance is not to do this from Windows (the version of ab on the Windows 8.1 partition may be older than the Arch one in this instance, but I am not motivated to fix that).

Here is a simple comparison. I am posting the Windows results first, then I am going to reboot into ArchLinux and post those:

C:\opt\httpd\bin> ab -n 100 -c 10 http://www....com/                
This is ApacheBench, Version 2.3                    
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ 
Licensed to The Apache Software Foundation, http://www.apache.org/       

Benchmarking www.....com (be patient).....done                       

Server Software:        Apache                                           
Server Hostname:        www.....com                                  
Server Port:            80                                               

Document Path:          /                                                
Document Length:        4502 bytes                                       

Concurrency Level:      10                                               
Time taken for tests:   6.391 seconds                                    
Complete requests:      100                                              
Failed requests:        0                                                
Write errors:           0                                                
Total transferred:      475900 bytes                                     
HTML transferred:       450200 bytes                                     
Requests per second:    15.65 [#/sec] (mean)                             
Time per request:       639.136 [ms] (mean)                              
Time per request:       63.914 [ms] (mean, across all concurrent requests
Transfer rate:          72.71 [Kbytes/sec] received                      

Connection Times (ms)                                                    
              min  mean[+/-sd] median   max                              
Connect:       59   63   2.7     62      76                              
Processing:    65  542  99.0    567     586                              
Waiting:       63  321 162.4    316     586                              
Total:        125  605  98.5    630     653                              

Percentage of the requests served within a certain time (ms)             
  50%    630                                                             
  66%    633                                                             
  75%    635                                                             
  80%    638                                                             
  90%    644                                                             
  95%    647                                                             
  98%    649                                                             
  99%    653                                                             
 100%    653 (longest request)

compared to the following in XFCE4 Terminal on ArchLinux (same hardware, same network connection):

$ ab -n 100 -c 10 http://www....com/ 
This is ApacheBench, Version 2.3 
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking www.....com (be patient).....done


Server Software:        Apache
Server Hostname:        www.....com
Server Port:            80

Document Path:          /
Document Length:        4502 bytes

Concurrency Level:      10
Time taken for tests:   1.799 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      475900 bytes
HTML transferred:       450200 bytes
Requests per second:    55.60 [#/sec] (mean)
Time per request:       179.867 [ms] (mean)
Time per request:       17.987 [ms] (mean, across all concurrent requests)
Transfer rate:          258.38 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       59   81  16.3     77     113
Processing:    62   91  19.9     89     151
Waiting:       60   82  16.5     79     139
Total:        126  172  15.1    174     226

Percentage of the requests served within a certain time (ms)
  50%    174
  66%    175
  75%    176
  80%    180
  90%    190
  95%    192
  98%    209
  99%    226
 100%    226 (longest request)

I can't find where he said he's using Windows. A delete comment? Should definitely not be using `fork` for this there. // Why would `ab` be slow on Windows? — ikegami, Jul 30 '15 at 17:02
@ikegami Over years, across Linux kernel, Windows, and `ab` versions, I have noticed I get higher performance when I boot into Linux on the same hardware than when I run it from a Windows cmd shell. I do not have solid analysis. — Sinan Ünür, Jul 30 '15 at 17:05
`curl`, or rather `libcurl`, can be access from Perl via Net::Curl::Multi. Way faster than LWP. — ikegami, Jul 30 '15 at 17:06
@ikegami: *"I am using windows"* -- currently the tenth comment under the question. I added the tag accordingly — Borodin, Jul 31 '15 at 09:37

I don't understand why all forked children do not start at exactly the same time using Parallel::ForkManager in Perl

2 Answers2