IoUring with batch submission decrease the throughput

Question

I want to use IoUring to speed up my application. After some search, I found that batch-submit IO requests can have higher throughput.

Therefore, I write the following code to read a big file(3.6GB):

    let mut file = fs::File::open(&path).unwrap();
    // File is splitted into multiple segments, read the offsets such that we can read segments correctly
    let offsets = read_offsets(&mut file);

    // settup uring
    let mut ring = IoUring::new(1024).unwrap();
    // Read batch_size from the command line argument
    let batch_size = args.batch_size;

    let rounds = (offsets.len() - 1) / batch_size;
    // Pre-allocate the buffer to avoid repeated buffer allocation
    let mut buffers = vec![Vec::new(); batch_size];

    let now = std::time::Instant::now();
    for i in 0..rounds {
        let base = i * batch_size;
        batch_read(
            &file,
            &mut ring,
            &mut buffers,
            &offsets,
            base,
            batch_size,
        )
    }

The core function is batch_read, it is defined as follows:

#[allow(clippy::uninit_vec)]
fn batch_read(
    file: &fs::File,
    ring: &mut IoUring,
    buffers: &mut [Vec<u8>],
    offsets: &[u64],
    base: usize,
    batch_size: usize
) {
    for j in 0..batch_size {
        let mut submission = ring.submission();
        let start = offsets[base + j];
        let end = offsets[base + j + 1];
        let buf = buffers.get_mut(j).unwrap();
        let len = (end - start) as usize;
        buf.clear();
        buf.reserve(len);
        unsafe {
            buf.set_len(len);
        }

        let read_e = opcode::Read::new(types::Fd(file.as_raw_fd()), buf.as_mut_ptr(), len as _)
            .offset64(start as i64)
            .build();
        unsafe {
            submission.push(&read_e).unwrap();
        }
    }
    // Batch submit and wait here
    ring.submit_and_wait(batch_size).unwrap();
    let completions = ring.completion();
    for _ in completions {}
}

However, after increasing the batch_size, it takes more time to read the file.... It does not make sense... According to lord of io_uring, batch-submit can decrease the number of system calls, it is likely to be faster. It may become slower when batching takes more time than doing IO.

batch_size	time(us)
1	2302266
8	2667608
64	2896108
256	3001141

I am using IoUring on Ubuntu 20.04 with the 5.15.0-67-generic kernel version. The filesystem is btrfs, and 4 NVMe SSD consists RAID0. Each SSD can achieve 2.8GB/s sequential IO. Therefore, I can confirm the bottleneck of IO is never reached.

Update: Can single thread reach the Io bottleneck? According to the blog, a single thread can only issue one outstanding IO. Does IoUring also satisfy this constraint? If the answer is yes, the only advantage of bach-submit is decreasing the number of enter system call?

IoUring with batch submission decrease the throughput

0 Answers0