SHORT: My problem is that the following function which sum the content of a given array in a given range, doesn't take the same execution time for the same task as the parent if called from a child. And by same I mean similar values. Because after some tests, the differences is for about a ~40% more for the child to execute.
LONG: I'm studying computer science and I'm working on the final project of a course. The problem is to sum all the cells of an int array by n processes distributing the workload. then confront it to a single calculation made by parent. The point is to demonstrate that, with big arrays, multiple process can reduce the execution time. However, the sum of all children's times is always more than the parent's even with 50milions data.
Following just the function and the struct I use to pass results.
typedef struct cResults{
double time;
int result;
}cResult;
cResult arraySum(int start, int length, int* dataset)
{
/*Time unit to calculate the computation time*/
clock_t tic, toc = 0;
cResult chunk = {.result = 0, .time =0};
tic = clock(); // Start measure time
for (int i = start; i < start + length; i++)
{
chunk.result += dataset[i];
}
toc = clock(); // Stop measure time
chunk.time = ((double) (toc - tic))/ CLOCKS_PER_SEC;
printf("B: start at: %d, length: %d, sum: %d, in: %f\n", start, length, chunk.result, chunk.time); //! DEBUG
return chunk;
}
WHAT I'VE TRIED SO FAR:
- Since the array is dynamically allocated, I've thought that it could be a bottleneck on the memory access. However, this question (Malloc returns same address in parent and child process after fork) lifted all doubt that even if heap allocated, they are not the same, but a copy of the array.
- I've double checked that the parent will sum correctly and only once the elapsed time communicated by all the children, and then added the
print()
statement just to read and sum manually on the terminal all the results. And again, all checks. - I've tried moving the function call by parent from before to after all children were done, but no changes, then I've tried make the parent
sleep()
right afterfork()
(this was counterproductive for the purpose of the project but just to make sure) for avoiding resource queue. - The random numbers in the array are produced in a repeatable way through a seed, so I've tried same datasets that of course will give almost identical outputs, and again times will change slightly yet maintaining the single execution faster.
- Ultimately I've tried to fork a single child and make it calculate the same range as the parent (so all the array). The time is on average 45% slower on children.
Surely I'm missing a simple thing but i run out of ideas... please be patient I'm learning by my self at my best.
UPDATE 1:
Since I've been asked for a full program, I've refactored mine. However, whereas the project required a single source file I've removed all of non regarding parts of our issue, hence it should be lighter. Unfortunately the frame you'll see that handle the pipe communications is a bit complex I'm afraid but it serve for other purpose that has been removed yet essential to make this works. I do hope it won't be a problem.
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <limits.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
/* ----- PROTOTYPES ----- */
#define MIN 5 // Least number of process and threads
#define MAX 10 // Maximum number of process and threads
#define MSGSIZE 4 // Size of pipe messages
/* =================
*PIPE MESSAGES*
sum = Sum from the next given start to the next given roof.
end = End process
tmr = Return the computation time measured by the process
================= */
/// @brief Struct containing proces id (pid), the result and it's pipe file descriptors
struct process{
pid_t pid;
long int result;
bool done;
double time;
int fd[2];
};
/// @brief Struct that containts the result of the computation
typedef struct cResults{
double time;
long int result;
}cResult;
/// @brief W = write, R = read
enum PIPECONTROLLER{
R = 0,
W = 1
};
/* ----- PROTOTYPES ----- */
cResult arraySum(int start, int length, int* dataset);
int main()
{
/* =================
*USER DEFINED*
================= */
int dataLength = 50000000; // Set the length of the dataset
int nProcess = 1; // How many child process do you want
unsigned int seed = 1; // Change the randomization seed of the data
// System
int* data;
cResult chunk; // Used for storing temporary computational values
// Task related
int taskLenght;
int remainder;
// Pipe related
char pipeMSG[MSGSIZE];
int msgCheck = 1;
/// @brief Processes dashboard
struct process processes[MAX + 1] = { {.pid = 0, .result = 0, .done = false, .time = 0} };
data = malloc(sizeof(int) * dataLength);
srand(seed);
for (int i = 0; i < dataLength; i++)
{
/*Random population between 0-100*/
data[i] = rand() % 100;
}
chunk = arraySum(0, dataLength, data);
processes[nProcess + 1].result = chunk.result;
processes[nProcess + 1].time = chunk.time;
printf("\nCHECK SUM: %ld\n", chunk.result);// ! Debug
#pragma region "START PROCESSES"
/*Calculate how to separate the overhead for the processes*/
taskLenght = dataLength / nProcess;
remainder = dataLength % nProcess;
pid_t myPid = 0;
int startPoint = 0;
processes[nProcess + 1 ].pid = getpid();
/*Open child to parent pipe*/
if (pipe(processes[nProcess + 1 ].fd) == -1)
{
printf("Failed to open pipe on parent\n");
return 1;
}
for (int i = 0; i < nProcess; i++)
{
/*Open new parent to child pipe*/
if (pipe(processes[i].fd) == -1)
{
printf("Failed to open pipe on parent\n");
return 1;
}
myPid = fork();
switch (myPid)
{
case -1: // Error on fork
printf("An error occured while forking the %d process.\n", i);
return 1;
break;
case 0: // Child case
/*Record pid in the dashboard*/
processes[i].pid = getpid();
/*Handle the pipes descriptors*/
close(processes[i].fd[W]);
close(processes[nProcess + 1 ].fd[R]);
i = nProcess;
break;
default: // Parent case
/* Record the pid process into the dashrboard and increment the starting for the next*/
processes[i].pid = myPid;
startPoint += taskLenght;
/*Handle the pipes descriptors*/
close(processes[i].fd[R]);
break;
}
}
/*=========== CHILD PROCESS HANDLER ===========*/
if(myPid == 0)
{
int myID;
bool keepAlive = true;
for(myID = 0; myID < nProcess; myID++)
{
if (processes[myID].pid == getpid())
{
break;
}
}
/*Calculate first iteration of the sum*/
cResult temp = arraySum(startPoint, taskLenght, data);
chunk.result = temp.result;
chunk.time = temp.time;
while(keepAlive)
{
/*Comunicate the id of the process*/
if (write(processes[nProcess + 1 ].fd[W], &myID, sizeof(int)) < 0)
{
printf("An error occured from the process %d while sending message to parent\n", getpid());
return 1;
}
/*Communicate the result of the operation*/
if (write(processes[nProcess + 1 ].fd[W], &chunk.result, sizeof(int)) < 0)
{
printf("An error occured from the process %d while sending message to parent\n", getpid());
return 1;
}
/*Communicate the time elapsed for the operation*/
if (write(processes[nProcess + 1 ].fd[W], &chunk.time, sizeof(double)) < 0)
{
printf("An error occured from the process %d while sending message to parent\n", getpid());
return 1;
}
/*Waits for further instruction*/
msgCheck = read(processes[myID].fd[R], pipeMSG, MSGSIZE);
if(msgCheck < 0)
{
printf("An error occured from the process %d while reading message from parent\n", getpid());
return 1;
}
/*Sum command*/
if(!strcmp(pipeMSG, "sum"))
{
msgCheck = read(processes[myID].fd[R], &startPoint, sizeof(int));
if(msgCheck < 0)
{
printf("An error occured from the process %d while reading message from parent\n", getpid());
return 1;
}
msgCheck = read(processes[myID].fd[R], &taskLenght, sizeof(int));
if(msgCheck < 0)
{
printf("An error occured from the process %d while reading message from parent\n", getpid());
return 1;
}
/*Calculate second iteration for the remaining part*/
temp = arraySum(startPoint, taskLenght, data);
chunk.result += temp.result;
chunk.time += temp.time;
}
/*Close command*/
if(!strcmp(pipeMSG, "end"))
{
keepAlive = false;
}
}
free(data);
close(processes[myID].fd[R]);
exit(0);
}
/*=========== PARENT PROCESS HANDLER ===========*/
if(myPid != 0)
{
/*Close descriptor for writing on main pipe.*/
close(processes[nProcess + 1 ].fd[W]);
int targetProcess = nProcess + 1; // Target self
bool onGoing = true;
chunk.result = 0;
chunk.time = 0;
while(onGoing)
{
/*Listen from processes if someone ended the task*/
msgCheck = read(processes[nProcess + 1 ].fd[R], &targetProcess, sizeof(int));
if(msgCheck < 0)
{
printf("An error occured from the process %d while reading message from parent\n", getpid());
return 1;
}
/*Get result from child process*/
msgCheck = read(processes[nProcess + 1 ].fd[R], &processes[targetProcess].result, sizeof(int));
if(msgCheck < 0)
{
printf("An error occured from the process %d while reading message from parent\n", getpid());
return 1;
}
/*Get elapsed time from child process*/
msgCheck = read(processes[nProcess + 1 ].fd[R], &processes[targetProcess].time, sizeof(double));
if(msgCheck < 0)
{
printf("An error occured from the process %d while reading message from parent\n", getpid());
return 1;
}
processes[targetProcess].done = true;
/*Check if remainder to start new task*/
if(remainder != 0)
{
startPoint = taskLenght * nProcess;
processes[targetProcess].done = false;
if (write(processes[targetProcess].fd[W], "sum", MSGSIZE) < 0)
{
printf("An error occured from the process %d while sending message to parent\n", getpid());
return 1;
}
if (write(processes[targetProcess].fd[W], &startPoint, sizeof(int)) < 0)
{
printf("An error occured from the process %d while sending message to parent\n", getpid());
return 1;
}
if (write(processes[targetProcess].fd[W], &remainder, sizeof(int)) < 0)
{
printf("An error occured from the process %d while sending message to parent\n", getpid());
return 1;
}
remainder = 0; //Avoid looping task
}
/*Check for pending response and process final result*/
for (int i = 0; i < nProcess; i++)
{
if(processes[i].done)
{
chunk.result += processes[i].result;
chunk.time += processes[i].time;
onGoing = false;
continue;
}
onGoing = true;
/*Reset total calculations*/
chunk.result = 0;
chunk.time = 0;
break;
}
/*Reset to self target*/
targetProcess = nProcess + 1;
}
printf("Parent calculated: %ld in = %fs\n", processes[nProcess + 1].result, processes[nProcess + 1].time); //! Debug
printf("Processes calculated: %ld in = %fs\n", chunk.result, chunk.time); //! Debug
}
}
cResult arraySum(int start, int length, int* dataset)
{
/*Time unit to calculate the computation time*/
clock_t tic, toc = 0;
cResult chunk = {.result = 0, .time =0};
tic = clock(); // Start measure time
for (int i = start; i < start + length; i++)
{
chunk.result += dataset[i];
}
toc = clock(); // Stop measure time
chunk.time = ((double) (toc - tic))/ CLOCKS_PER_SEC;
printf("start at: %d, length: %d, sum: %ld, in: %f\n", start, length, chunk.result, chunk.time); //! Debug
return chunk;
}
If you want to try this out you'll find in USER DEFINED some variable to play with.
I'll share some of my results, with seed = 1
.
LENGTH | PARENT | 1 CHILD | 5 CHILDREN | 10 CHILDREN |
---|---|---|---|---|
1'000 | 0.000001 | 0.000002 | 0.000003 | 0.000006 |
100'000 | 0.000085 | 0.000107 | 0.000115 | 0.000120 |
10'000'000 | 0.008693 | 0.015143 | 0.016120 | 0.015982 |
100'000'000 | 0.089563 | 0.148095 | 0.146698 | 0.149421 |
500'000'000 | 0.669474 | 0.801828 | 0.744381 | 0.816883 |
As you can see even repeating the task in a single child process in some cases required twice the time.
As ArkadiuszDrabczyk pointed out, it could be a scheduling issue, still why has always to be the child the slowest one?
UPDATE 2:
Since pipe's problems arose, I wrote another source just to exclude these concerns and, of course, the problem remained.
However I wasn't convinced about the deep-copy of the dataset[]
array stored on heap hence some research later I found this: What is copy-on-write?. After this new found knowledge, I've added this useless function to be called right before the arraySum()
:
void justCheck(int start, int length, int* dataset)
{
for (int i = start; i < start + length; i++)
{
dataset[i] = dataset[i];
}
return;
}
This little snippet managed to level the differences between times. At least now they lay on the same order of magnitude. Down the results:
LENGTH | PARENT | 1 CHILD | 5 CHILDREN | 10 CHILDREN |
---|---|---|---|---|
1'000 | 0.000002 | 0.000001 | 0.000004 | 0.000003 |
100'000 | 0.000099 | 0.000110 | 0.000124 | 0.000121 |
10'000'000 | 0.009496 | 0.008686 | 0.009316 | 0.009248 |
100'000'000 | 0.090267 | 0.092168 | 0.089862 | 0.093356 |
500'000'000 | - | - | - | - |
Unfortunately this edit rise another problem. With big set of data this function freeze. I know this isn't the right way to force the COW so... Any suggestions?