Context
I've been working on a university project which requires the use of both OpenMp and MPI to examine a .csv
file (of 1 million lines) and extract some statistics.
I've managed to write and test a program that works fine on my machine, so I opened an AWS account to tests the actual parallel performance of the program when it runs on multiple nodes.
Problem
When I run the code on a single AWS EC2 (Amazon Linux 2, t2.xlarge) instance, I get a segmentation fault
that is due to the MPI_BCast
call, even though the code worked fine on my machine. I really want to understand this problem before extending its execution to other nodes.
I've narrowed down the code that produces this error to the following:
#define FIN_PATH R"(NYPD_Motor_Vehicle_Collisions.csv)"
#define LINES 955928
#define MAX_LINE_LENGHT 500
...
int main() {
int RANK;
int SIZE;
int THREAD_SUPPORT;
MPI_Init_thread(nullptr, nullptr, MPI_THREAD_FUNNELED, &THREAD_SUPPORT);
MPI_Comm_size(MPI_COMM_WORLD, &SIZE);
MPI_Comm_rank(MPI_COMM_WORLD, &RANK);
int i;
//Initialize empty dataset
char ** data = new char*[LINES];
for(i = 0; i < LINES; ++i)
data[i] = new char[MAX_LINE_LENGHT] {'\0'};
// Populate dataset
if (RANK == 0) {
string line;
ifstream fin(FIN_PATH, ios::in);
getline(fin, line);
for(i = 0; i < LINES; ++i) {
getline(fin, line);
normalize(&line);
line.copy(data[i], line.size() + 1);
}
fin.close();
}
// Broadcast dataset to each process
MPI_Bcast(&data[0][0], LINES * MAX_LINE_LENGHT, MPI_CHAR, 0, MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
As you can see, I'm reading the file and saving every char
into a 2d array (because MPI cannot handle strings) which I then broadcast to every process (not elegant, but it helps me avoid major headaches down the road, e.g.: implementing MPI I/O).
I don't get any error when reading the file, only when I reintroduce the MPI_Bcast function.
I made sure that the AWS EC2 Instance had enough RAM (16Gb) so I don't see why this segmentation fault
would occur, and I'm new to everything (except c++
programming) so I don't have the tools to debug an error like this yet. Any insight is appreciated!