Questions tagged [parallel-io]

When you have two or more independent I/O devices and want to maximize I/O throughput of a single task (usually a long-running job). Related to Parallel processing but for I/O usage rather than CPU usage. General techniques are ensuring separate physical disks, ring buffers, separate reader and writer threads, and asynchronous I/O.

7 questions
7
votes
7 answers

Fast Disk Cloning

Is there a way to have Linux read ahead when cloning a disk? I use the program named "dd" to clone disks. The last time I did this it seemed as though the OS was reading then writing but never at the same time. Ideally, the destination disk would…
Mike
  • 1,760
  • 2
  • 18
  • 33
3
votes
1 answer

Does MPI_File_write_at() initialize the file with zeros?

Consider the following simple program which writes the rank of all processes whos rank is bigger than zero into a file: #include int main() { MPI_Init(NULL, NULL); int world_rank, world_size; MPI_Comm_rank(MPI_COMM_WORLD,…
handy
  • 696
  • 3
  • 9
  • 22
1
vote
2 answers

How does Spark perform I/O?

It is my understanding that Spark uses parallel IO to read files. That conclusion comes from other stack overflow responses. My question is does spark read data using an independent approach or a collective approach? In other words, does each…
Beefger
  • 23
  • 3
0
votes
1 answer

Parallel I/O: file per process vs libraries like HDF5

For high performance computing applications with parallel I/O onto Lustre file systems, does file-per-process output give the upper limit to performance? I had always used HDF5, assuming it was some sort of high performance library, until I…
defleppard
  • 11
  • 1
0
votes
0 answers

How to write large numbers of .jpg, .png, etc images to a single file?

I am trying to optimize deep learning computer vision pipelines for HPC architectures that have high performance parallel IO. Storing large numbers of files in a single directory is an anti-pattern on such systems. Much better IO performance will be…
davidrpugh
  • 4,363
  • 5
  • 32
  • 46
0
votes
1 answer

Write huge arrays to a single file using MPI-IO with shared file pointer

I'm trying to write several long distributed arrays to a single file using MPI-I/O (OpenMPI implementation) with shared file pointer. I get the following error messages lseek:Invalid argument WRITE FAILED I prepared a simplified code snippet to…
0
votes
1 answer

parallel write to different groups with h5py

I'm trying to use parallel h5py to create an independent group for each process and fill each group with some data.. what happens is that only one group gets created and filled with data. This is the program: from mpi4py import MPI import h5py rank…
Shazly
  • 95
  • 1
  • 1
  • 11