5

Occasionally we need to copy huge files from one bucket to another in AWS S3. Whenever possible we use the CopyRequest to handle this operation all on AWS (since no round trip required back to the client). But sometimes we do not have the option to do this because we need to copy between 2 completely separate accounts which requires a GET and then a PUT.

Problems:

  1. The response stream returned from the GET is not seekable so it cannot be passed to the PUT request and have it stream seamlessly from one to the other
  2. Copying the response stream to an intermediary stream (MemoryStream) using CopyTo() and then passing that to the PUT operation works well but doesn't scale (large files will throw OutOfMemory exceptions)

So basically I need an intermediary stream that I can read/write to at the same time, basically I would read a chunk from the response stream and write it to my intermediary stream, meanwhile the PUT request is reading out the content and its just a seamless pass-thru sort of scenario.

I found this post on stackoverflow and it seemed promising at first but it still throws an OutOfMemory exception with large files.

.NET Asynchronous stream read/write

Anyone ever had to do something similar to this? How would you tackle it? Thanks in advcance

Community
  • 1
  • 1
snappymcsnap
  • 2,050
  • 2
  • 29
  • 53
  • 1
    This is close to BufferedStream. Except seek, not supported if the input stream doesn't support it. It isn't very clear why you'd need to seek and what range needs to be seekable. – Hans Passant Aug 17 '12 at 17:26

2 Answers2

3

It's not clear why you would want to use MemoryStream. The Stream.CopyTo method in .NET 4 doesn't need to use an intermediate stream - it will just read into a local buffer of a fixed size, then write that buffer to the output stream, then read more data (overwriting the buffer) etc.

If you're not using .NET 4, it's easy to implement something similar, e.g.

public static void CopyTo(this Stream input, Stream output)
{
    byte[] buffer = new byte[64 * 1024]; // 64K buffer
    int bytesRead;
    while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
    {
        output.Write(buffer, 0, bytesRead);
    }
}
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • the main reason for having MemoryStream as an intermediate between the inbound response stream and the outbound stream for the PUT request is that the former does not support 'seek' operations so it cannot be directly passed as the source to the PUT request. Does that make sense? – snappymcsnap Aug 17 '12 at 15:30
  • @snappymcsnap: So can you not *write* to the PUT request as a stream? – Jon Skeet Aug 17 '12 at 15:32
  • 1
    the PutObjectRequest object only has a property to specify the InputStream to use, it does not itself support direct Write operations – snappymcsnap Aug 17 '12 at 15:35
  • 1
    @snappymcsnap: Okay, that makes more sense. It sounds like you may need to write to disk as an intermediate file then. – Jon Skeet Aug 17 '12 at 15:36
  • I was hoping there was a more elegant way to do where I could just funnel bytes into one end while they were read out the other. I appreciate your help though – snappymcsnap Aug 17 '12 at 15:40
3

I found this, but it uses a Queue internally, which the author notes is an order of magnitude slower than a MemoryStream.

http://www.codeproject.com/Articles/16011/PipeStream-a-Memory-Efficient-and-Thread-Safe-Stre

I keep hoping I'll find an official MS library solution, but it seems that this wheel hasn't been properly invented yet.

Mike Asdf
  • 2,309
  • 26
  • 34