0

I often come across huge files in s3 buckets with gzip format. These are basically text files in gzip format, so I want to download a partial file (let's say few hundred lines).

I didn't found any option in s3cmd which allow me to download a partial file even if it a plain text file without any compression.

Following is the java code what I have right now, which again downloads the complete file, what else I should be doing here to download a partial file which is in gzip format.

    String outPutFile = 'mylocalfile.txt';
    File file = new File(outPutFile);
    S3Object s3object = s3Client.getObject(new GetObjectRequest(bucketName, key));
    InputStream reader = new BufferedInputStream(s3object.getObjectContent());
    OutputStream writer = new BufferedOutputStream(new FileOutputStream(file));
    int read = -1;

    while ((read = reader.read()) != -1) {
        writer.write(read);
    }
Som
  • 950
  • 2
  • 16
  • 29

1 Answers1

0

GZIPInputStream reading line by line helped me to solve my problem. So finally here is what I have.

int numOfLinesRead = 0;
String outPutFile = 'mylocalfile.txt';
FileWriter writer = new FileWriter(outPutFile);

S3Object s3object = s3Client.getObject(new GetObjectRequest(bucketName, key));
InputStream fileStream = new BufferedInputStream(s3object.getObjectContent());
gzipStream = new GZIPInputStream(fileStream);
decoder = new InputStreamReader(gzipStream, "UTF-8");
BufferedReader buffered = new BufferedReader(decoder);

while ((thisLine = buffered.readLine()) != null && numOfLinesRead < numOfLinesToRead) {
    writer.write(thisLine+'\n');
    numOfLinesRead++;
}
Community
  • 1
  • 1
Som
  • 950
  • 2
  • 16
  • 29