I want to use IoUring
to speed up my application. After some search, I found that batch-submit IO requests can have higher throughput.
Therefore, I write the following code to read a big file(3.6GB):
let mut file = fs::File::open(&path).unwrap();
// File is splitted into multiple segments, read the offsets such that we can read segments correctly
let offsets = read_offsets(&mut file);
// settup uring
let mut ring = IoUring::new(1024).unwrap();
// Read batch_size from the command line argument
let batch_size = args.batch_size;
let rounds = (offsets.len() - 1) / batch_size;
// Pre-allocate the buffer to avoid repeated buffer allocation
let mut buffers = vec![Vec::new(); batch_size];
let now = std::time::Instant::now();
for i in 0..rounds {
let base = i * batch_size;
batch_read(
&file,
&mut ring,
&mut buffers,
&offsets,
base,
batch_size,
)
}
The core function is batch_read, it is defined as follows:
#[allow(clippy::uninit_vec)]
fn batch_read(
file: &fs::File,
ring: &mut IoUring,
buffers: &mut [Vec<u8>],
offsets: &[u64],
base: usize,
batch_size: usize
) {
for j in 0..batch_size {
let mut submission = ring.submission();
let start = offsets[base + j];
let end = offsets[base + j + 1];
let buf = buffers.get_mut(j).unwrap();
let len = (end - start) as usize;
buf.clear();
buf.reserve(len);
unsafe {
buf.set_len(len);
}
let read_e = opcode::Read::new(types::Fd(file.as_raw_fd()), buf.as_mut_ptr(), len as _)
.offset64(start as i64)
.build();
unsafe {
submission.push(&read_e).unwrap();
}
}
// Batch submit and wait here
ring.submit_and_wait(batch_size).unwrap();
let completions = ring.completion();
for _ in completions {}
}
However, after increasing the batch_size, it takes more time to read the file.... It does not make sense... According to lord of io_uring, batch-submit can decrease the number of system calls, it is likely to be faster. It may become slower when batching takes more time than doing IO.
batch_size | time(us) |
---|---|
1 | 2302266 |
8 | 2667608 |
64 | 2896108 |
256 | 3001141 |
I am using IoUring
on Ubuntu 20.04
with the 5.15.0-67-generic
kernel version. The filesystem is btrfs, and 4 NVMe SSD consists RAID0. Each SSD can achieve 2.8GB/s sequential IO. Therefore, I can confirm the bottleneck of IO is never reached.
Update: Can single thread reach the Io bottleneck? According to the blog, a single thread can only issue one outstanding IO. Does IoUring also satisfy this constraint? If the answer is yes, the only advantage of bach-submit is decreasing the number of enter
system call?