0

Context

I've been working on a university project which requires the use of both OpenMp and MPI to examine a .csv file (of 1 million lines) and extract some statistics.

I've managed to write and test a program that works fine on my machine, so I opened an AWS account to tests the actual parallel performance of the program when it runs on multiple nodes.

Problem

When I run the code on a single AWS EC2 (Amazon Linux 2, t2.xlarge) instance, I get a segmentation fault that is due to the MPI_BCast call, even though the code worked fine on my machine. I really want to understand this problem before extending its execution to other nodes.

I've narrowed down the code that produces this error to the following:


#define FIN_PATH R"(NYPD_Motor_Vehicle_Collisions.csv)"
#define LINES 955928
#define MAX_LINE_LENGHT 500

...

int main() {
    int RANK;
    int SIZE;
    int THREAD_SUPPORT;

    MPI_Init_thread(nullptr, nullptr, MPI_THREAD_FUNNELED, &THREAD_SUPPORT);

    MPI_Comm_size(MPI_COMM_WORLD, &SIZE);
    MPI_Comm_rank(MPI_COMM_WORLD, &RANK);

    int i;

    //Initialize empty dataset
    char ** data = new char*[LINES];

    for(i = 0; i < LINES; ++i)
        data[i] = new char[MAX_LINE_LENGHT] {'\0'};


    // Populate dataset
    if (RANK == 0) {
        string line;
        ifstream fin(FIN_PATH, ios::in);

        getline(fin, line);

        for(i = 0; i < LINES; ++i) {
            getline(fin, line);
            normalize(&line);
            line.copy(data[i], line.size() + 1);
        }
        fin.close();
    }


    // Broadcast dataset to each process
    MPI_Bcast(&data[0][0], LINES * MAX_LINE_LENGHT, MPI_CHAR, 0, MPI_COMM_WORLD);


    MPI_Finalize();
    return 0;
}


As you can see, I'm reading the file and saving every char into a 2d array (because MPI cannot handle strings) which I then broadcast to every process (not elegant, but it helps me avoid major headaches down the road, e.g.: implementing MPI I/O).

I don't get any error when reading the file, only when I reintroduce the MPI_Bcast function.

I made sure that the AWS EC2 Instance had enough RAM (16Gb) so I don't see why this segmentation fault would occur, and I'm new to everything (except c++ programming) so I don't have the tools to debug an error like this yet. Any insight is appreciated!

  • You should allocate one big array of `LINES * MAX_LINE_LENGTH` chars. MPI cannot follow the pointers in 2D arrays that are implemented as an array of pointers. (Ok, it can, but it is a rather involved process and requires the definition of user datatypes) This is a common mistake - you are not the first one to encounter it. – Hristo Iliev Apr 24 '20 at 17:45
  • @HristoIliev Thank you! That never crossed my mind because it worked fine when I ran the code on my computer. I can't wait to try your suggestion! – Rahman Khalaf Apr 24 '20 at 18:01
  • You can allocate one big array and then make an array of pointers into the big array. That way you get the best of both worlds - continuous array for MPI and a convenient 2D array notation for yourself. This is what most HPC users do. See the question that I've closed yours for being duplicate of. – Hristo Iliev Apr 24 '20 at 18:07
  • @HristoIliev It works! Following the instructions provided in the other thread you linked I don't get any segmentation faults. – Rahman Khalaf Apr 27 '20 at 15:53

1 Answers1

1
for(i = 0; i < LINES; ++i)
    data[i] = new char[MAX_LINE_LENGHT] {'\0'};

You have allocated a whole bunch of individual strings at different addresses.

MPI_Bcast(&data[0][0], LINES * MAX_LINE_LENGHT, MPI_CHAR, 0, MPI_COMM_WORLD);

But you told MPI_Bcast to send a single object of size LINES * MAX_LINE_LENGHT. Nowhere did you create an object like that.

You need to allocate a single object at a single address containing all the data you want to send because that's what MPI_Bcast expects.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278