I write a service that download images, join it via zip archive and upload it back to aws. This service should be time-efficient. My first version was dead simple:
- Download all files in parallel and save to the disk.
- Read all files from the disk, join it by zip package and save back to the disk
- Read archive from the disk and send it to the s3.
But I think all those save and read from disk operations is less performant than in memory communication.
I joined downloading and archiving together (all download readers going directly to the archiver). But I can't understand how to join it with the uploader.
S3 uploader need a ReadSeeker to put object. Current implementation of archiver is:
func Archive(inputQueue <-chan Input) io.ReadSeeker {
zipFile, err := os.Create("test_arch.zip")
if log.Error(err) {
os.Exit(1)
}
arch := zip.NewWriter(zipFile)
go func() {
defer arch.Close()
for input := range inputQueue {
header := &zip.FileHeader{
Name: filepath.Join(baseDir, input.Path()),
Method: zip.Store,
}
writer, err := arch.CreateHeader(header)
if log.Error(err){
os.Exit(1)
}
_, err = io.Copy(writer, input.Reader())
}
}()
return zipFile
}
It save archive to the disk. How to write archihve to intermediate structure to pass this structure to s3 uploader that require a ReadSeeker?