How to directly copy limited bytes from reader to writer

Question

I was trying to write a function that copies a user-specified number of bytes from a reader to a writer, and I came up with this:

fn io_copy(
    reader: &mut std::io::Read,
    writer: &mut std::io::Write,
    byte_count: usize,
) -> std::io::Result<()> {
    let mut buffer: [u8; 16384] = unsafe { std::mem::uninitialized() };
    let mut remaining = byte_count;
    while remaining > 0 {
        let to_read = if remaining > 16384 { 16384 } else { remaining };

        reader.read_exact(&mut buffer[0..to_read])?;
        remaining -= to_read;
        writer.write(&buffer[0..to_read])?;
    }

    Ok(())
}

It works fine, but I wanted to do it without an arbitrarily sized intermediate buffer, and I wondered if such a function already existed. I found std::io::copy, but that copies the whole stream, and I only want to copy a limited amount. I figured I could use take on the reader, but I'm having trouble getting rid of errors. This is what I have so far:

fn io_copy<R>(reader: &mut R, writer: &mut std::io::Write, byte_count: usize) -> std::io::Result<()>
where
    R: std::io::Read + Sized,
{
    let mut r = reader.by_ref().take(byte_count as u64);
    std::io::copy(&mut r, writer)?;
    Ok(())
}

This gives me an error:

error[E0507]: cannot move out of borrowed content
 --> src/lib.rs:6:21
  |
6 |         let mut r = reader.by_ref().take(byte_count as u64);
  |                     ^^^^^^^^^^^^^^^ cannot move out of borrowed content

I don't understand how to get around this.

"It works fine, but I wanted to do it without an arbitrarily sized intermediate buffer", I'm [sure](https://doc.rust-lang.org/stable/src/std/io/util.rs.html#48-68) `copy()` also use a buffer so I don't see any improvement. `take()` take... the ownership of self you can't "cheat" with `by_ref()`. But it could be nice to have such functionality in std you could ask it with an issue on github. — Stargateur, Sep 26 '18 at 08:03
(by the way, you first snipped could avoid use of magic number with `buffer.len()`) — Stargateur, Sep 26 '18 at 08:07
@Stargateur: My hope would be that the standard library implementers could potentially take advantage of implementation details that they are more familiar with. For example, perhaps the reader is already using an internal buffer, and it could copy directly from that. — Benjamin Lindley, Sep 26 '18 at 08:20
When you read and write you always need at least one buffer, an OS can't directly copy n byte from a disk to an another without. So OS ask you the buffer when you read, and write, so there is no performance loose. One type where you have a little loose is for networking because OS use an internal buffer to handle incoming packet until you ask them, but that also not to concern about because it impossible to avoid this second buffer because incoming packet could come while you are process previous. Don't worry use a buffer is the way to do it. — Stargateur, Sep 26 '18 at 08:32
Your question made me think about another one: https://stackoverflow.com/questions/52515361/why-by-ref-take-usage-differs-for-iterator-and-read. The answer there may be helpful. — OlegTheCat, Sep 26 '18 at 13:10
From the duplicate: `io::copy(&mut input.by_ref().take(5), &mut output)?;` A [complete example](https://play.rust-lang.org/?gist=98a3c141754464d3f30154f172843167&version=stable&mode=debug&edition=2015). To use a function from a trait, the trait must be in scope. — Shepmaster, Sep 26 '18 at 14:01

score -1 · Answer 1 · answered Sep 26 '18 at 10:06

I don't think you're going to get much better than that with just the generic Read/Write interfaces (except you probably shouldn't use read_exact in the case where you're ok with filling the entire buffer, that might lead to blocking unnecessarily).

You might be able to specialise it for specific Reads and Writes though, for example you might be able to use Kernel functions (like sendfile when reading from and writing to file descriptors in Linux) which might allow you to avoid having to copy things through userspace unnecessarily.

score -4 · Answer 2 · answered Sep 26 '18 at 08:07

-4

My implementation of copy_n would look like this (playground)

pub fn copy_n<R: ?Sized, W: ?Sized>(reader: &mut R, writer: &mut W, count: usize) -> io::Result<()>
    where R: Read, W: Write
{        
    let mut buf = vec![];
    unsafe { 
        buf.reserve(count); 
        buf.set_len(count);
    }

    reader.read_exact(&mut buf)?;
    writer.write_all(&buf)?;

    Ok(())
}

This solution just uses read_exact and write_all, which guarantee, that the complete buffer is read/written, or an error occurs, so it should be fine.

answered Sep 26 '18 at 08:07

hellow

12,430
7
56
79

2

What if your file is big ? That not recommended to read and write file with a buffer of the size of the file. – Stargateur Sep 26 '18 at 08:10
You're right. Would be writing one byte by an other be a better solution? :/ `C++` does it that way – hellow Sep 26 '18 at 08:13
I doubt that C++ copy byte by byte, a good solution is already present in the question, the buffer size is generally 1 or 2 pages size so 4096 or 8192. In this case OP choice 4 pages size. – Stargateur Sep 26 '18 at 08:16
https://github.com/llvm-mirror/libcxx/blob/master/include/algorithm#L1701-L1725 – hellow Sep 26 '18 at 08:19
Yes but this is in case the iterator can't be randomly access... just after there is an implementation that don't copy byte by byte for randomly access iterator. Any language couldn't do better. – Stargateur Sep 26 '18 at 08:46
@Stargateur It's not just that, the OutputIterator in this case can be anything at all too, there's actually just no way to write multiple things at once to an arbitrary OutputIterator. `std::io::{Read,Write}` provide a better abstraction for IO specifically, at the cost of being less general. So yeah, no, the C++ implementation can't really do better with what it's trying to do. – Cubic Sep 26 '18 at 09:55

How to directly copy limited bytes from reader to writer

2 Answers2