2

Is there an API to read aws s3 file in Go, I only find the API to download file to local machine, and then read the local downloaded file, but I need to read file in stream (like reading a local file).

I want to be able to read the file in real time, like read 100 bytes, do something to the 100 bytes, and read the last file. I only find the Go aws s3 API to download the entire file to local machine, and the handle the downloaded local file.

My current test code is this

func main() {
    bucket := "private bucket"
    item := "private item"

    file, err := os.Create("local path")
    if err != nil {
        exitErrorf("Unable to open file %q, %v", item, err)
    }

    defer file.Close()

    sess, _ := session.NewSession(&aws.Config{
        Region: aws.String(" ")},
    )

    downloader := s3manager.NewDownloader(sess)

    numBytes, err := downloader.Download(file,
        &s3.GetObjectInput{
            Bucket: aws.String(bucket),
            Key:    aws.String(item),
        })

    // Handle the downloaded file
    scanner := bufio.NewScanner(file)

    for scanner.Scan() {             
        // Do something
    }
}

I will download the file from s3 to local machine and then open the downloaded file and handle each byte.

I wonder can i directly read each line of the file(or read each 100 bytes of the file) from s3

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
lihaichao
  • 45
  • 1
  • 3
  • Reading a file (from the Internet) and downloading a file are synonymous. Please be more clear in what you need. – Jonathan Hall Feb 03 '20 at 06:19
  • Downloading a file means reading a file at a remote location. When you download a file, you read it. – Burak Serdar Feb 03 '20 at 06:21
  • Is this more clear, sorry for not declare my question clear.... – lihaichao Feb 03 '20 at 06:25
  • No, that doesn't really help. The only possible way to "download" is by reading, in real time. Please show your code, and explain exactly what it doesn't do that you want it to do. – Jonathan Hall Feb 03 '20 at 06:45
  • @Flimzy, the problem is that downloader.Download takes an io.WriterAt. It's not obvious how to turn that into an io.Reader. As far as I can see there are only partial solutions on SO, such as https://stackoverflow.com/questions/46019484. – Peter Feb 03 '20 at 08:18
  • @Peter I will try to implement my own io.WriteAt, thanks! – lihaichao Feb 03 '20 at 13:11
  • @Peter: You connect an io.Writer to an io.Reader using io.Pipe. https://stackoverflow.com/q/31259812/13860 – Jonathan Hall Feb 03 '20 at 15:45
  • @Flimzy, I'm aware, but this still only provides an io.Writer, not an io.WriterAt. – Peter Feb 03 '20 at 15:58
  • @Peter, is my answer what you are looking for? – SysCoder Feb 14 '21 at 21:26
  • @SysCoder, not at all because the WriteAt method doesn't behave as expected. It assumes that offset is always zero. If memory serves, the aws package downloads multiple chunks concurrently, do this shouldn't work. – Peter Feb 14 '21 at 22:25
  • @Peter, that is why you need to set `downloader.Concurrency = 1` "Specifying a Downloader.Concurrency of 1 will cause the Downloader to download the parts from S3 sequentially." https://docs.aws.amazon.com/sdk-for-go/api/service/s3/s3manager/ – SysCoder Feb 15 '21 at 20:28
  • @SysCoder, sequentially doesn't mean all at once. Given a large enough file [WriteAt is still called with specific offsets](https://github.com/aws/aws-sdk-go/blob/v1.37.10/service/s3/s3manager/download.go#L583), even with concurrency set to one. – Peter Feb 15 '21 at 20:55
  • @Peter, Yes, it is okay that WriteAt is called with different specific offsets. As long as the offsets are sequential things will be fine. The offset is disregarded because we know everything will be sequential when data is given to the Writer function that is wrapped in a WriterAt function. The reason why the offset is used when concurrent is greater than 1 is to be able to write the chunks possibly out of order and in parallel. Concurrency is set to one so things will be written to WriteAt in order and one at a time. – SysCoder Feb 15 '21 at 21:42

2 Answers2

3

Download() takes a WriterAt, but you want a Reader to read from. You can achieve this in four steps:

Create a fake WriterAt to wrap a Writer:

type FakeWriterAt struct {
    w io.Writer
}

func (fw FakeWriterAt) WriteAt(p []byte, offset int64) (n int, err error) {
    return fw.w.Write(p)
}

Create an io.Pipe to have the ability to read what is written to a writer:

r, w := io.Pipe()

Set concurrency to one so the download will be sequential:

downloader.Concurrency = 1

Wrap the writer created with io.Pipe() with the FakeWriterAt created in the first step. Use the Download function to write to the wrapped Writer:

go func() {
    defer w.Close()
    downloader.Download(FakeWriterAt{w},
        &s3.GetObjectInput{
            Bucket: aws.String(bucket),
            Key:    aws.String(key),
        })
}()

You can now use the reader from the io.Pipe to read from S3.

The minimum part size is 5 MB according to the documentaiton.

Reference: https://dev.to/flowup/using-io-reader-io-writer-in-go-to-stream-data-3i7b

SysCoder
  • 715
  • 7
  • 18
1

As far as i understand, you probably need a Range request to get file chunk by chunk.
Here is some pseudo-code:

// Setup input
input := &s3.GetObjectInput{
    Bucket: aws.String(BucketName),
    Key:    aws.String(Path),
}

// calculate position
input.Range = aws.String(fmt.Sprintf("bytes=%d-%d", Position, Offset))

// Get particular chunk of object
result, err := o.Service().GetObject(input)
if err != nil {
    return nil, err
}
defer result.Body.Close()

// Read the chunk
b, err := ioutil.ReadAll(result.Body)

Or, if you in some case need a file at once (i can't recommend it), just omit Range and that's it.

tacobot
  • 915
  • 2
  • 9
  • 15