I'm working on small Raspberry PI cluster, my host-program creates IP packet fragments and sends them to multiple relay-programes. Relays receive those packet fragments and forward them to destination using raw sockets. Because of raw sockets my relay-programes must run with sudo permission. My setup involves RPi 3 B v2 and RPi 2 B v1. SSH is already set up, nodes can SSH-in without password, although I must run ssh-agent and ssh-add my keys on each node. I've managed to run program sending rank from one node to another(2 different RPis). I run MPI programs in MPMD-way, since I have only 2 RPis I run host and relay on node #1 and relay on node #2. Host program takes path to file to be sent as command line argument.
If I run:
mpirun --oversubscribe -n 1 --host localhost /home/pi/Desktop/host /some.jpeg : -n 2 --host localhost,rpi2 /home/pi/Desktop/relay
it runs, but obviously program fail because relays can't open raw sockets without sudo permission.
If I run:
mpirun --oversubscribe -n 1 --host localhost /home/pi/Desktop/host /some.jpeg : -n 2 --host localhost,rpi2 sudo /home/pi/Desktop/relay
relays report world size: 1 and host program hangs.
If I run:
mpirun --oversubscribe -n 1 --host localhost sudo /home/pi/Desktop/host /some.jpeg : -n 2 --host localhost,rpi2 sudo /home/pi/Desktop/relay
all relays and host reports world size 1.
I found similar problem here: OpenMPI / mpirun or mpiexec with sudo permission
Following short answer I run:
mpirun --oversubscribe -n 1 --host localhost /home/pi/Desktop/host /some.jpeg : -n 2 --host localhost,rpi2 sudo -E /home/pi/Desktop/relay
which results in:
[raspberrypi:00979] OPAL ERROR: Unreachable in file ext2x_client.c at line 109
[raspberrypi:00980] OPAL ERROR: Unreachable in file ext2x_client.c at line 109
*** An error occurred in MPI_Init
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[raspberrypi:00979] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[raspberrypi:00980] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[32582,1],1]
Exit code: 1
--------------------------------------------------------------------------
I've run sudo visudo and my file on both nodes looks like that:
# User privilege specification
root ALL=(ALL:ALL) ALL
pi ALL = NOPASSWD:SETENV: /etc/alternatives/mpirun
pi ALL=NOPASSWD:SETENV: /usr/bin/orterun
pi ALL=NOPASSWD:SETENV: /usr/bin/mpirun
When I run everything on one node it just works:
sudo mpirun --alow-run-as-root --oversubscribe -n 1 --host localhost /home/pi/Desktop/host /some.jpeg : -n 2 --host localhost,localhost /home/pi/Desktop/relay
//host
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
int world_size = []() {
int size;
MPI_Comm_size(MPI_COMM_WORLD, &size);
return size;
}();
int id = []() {
int id;
MPI_Comm_rank(MPI_COMM_WORLD, &id);
return id;
}();
if (argc != 2) {
std::cerr << "Filepath not passed\n";
MPI_Finalize();
return 0;
}
const std::filesystem::path filepath(argv[1]);
if (not std::filesystem::exists(filepath)) {
std::cerr << "File doesn't exist\n";
MPI_Finalize();
return 0;
}
std::cout << "World size: " << world_size << '\n';
MPI_Finalize();
return 0;
}
//relay
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
int world_size = []() {
int size;
MPI_Comm_size(MPI_COMM_WORLD, &size);
return size;
}();
int id = []() {
int id;
MPI_Comm_rank(MPI_COMM_WORLD, &id);
return id;
}();
std::cout << "World size: " << world_size << '\n';
MPI_Finalize();
return 0;
}
How do I configure nodes to allow them to run MPI programs with sudo?