133

I'm starting a bash script which will take a path in S3 (as specified to the ls command) and dump the contents of all of the file objects to stdout. Essentially I'd like to replicate cat /path/to/files/* except for S3, e.g. s3cat '/bucket/path/to/files/*'. My first inclination looking at the options is to use the cp command to a temporary file and then cat that.

Has anyone tried this or similar or is there already a command I'm not finding which does it?

Pat Myron
  • 4,437
  • 2
  • 20
  • 39
Neil C. Obremski
  • 18,696
  • 24
  • 83
  • 112

5 Answers5

221

dump the contents of all of the file objects to stdout.

You can accomplish this if you pass - for destination of aws s3 cp command. For example, $ aws s3 cp s3://mybucket/stream.txt -.

What you're trying to do is something like this? ::

#!/bin/bash

BUCKET=YOUR-BUCKET-NAME
for key in `aws s3api list-objects --bucket $BUCKET --prefix bucket/path/to/files/ | jq -r '.Contents[].Key'`
do
  echo $key
  aws s3 cp s3://$BUCKET/$key - | md5sum
done
quiver
  • 4,230
  • 1
  • 22
  • 20
  • 8
    Note however that '-' as a placeholder for stdout does not work in all the versions of awscli. For example, the version 1.2.9, which comes with ubuntu LTS 14.04.2, doesn't support it. – Antonio Barbuzzi Jul 06 '15 at 16:40
  • Ditto that. I'm on Ubuntu 12.x, and it does not work in my instance of bash. – Kode Charlie Nov 12 '15 at 17:46
  • Problem with this is that you can't get a specific version of the file. – Eamorr Jul 04 '16 at 22:17
  • 1
    not working on macOS High Sierra 10.13.6 either (`aws --version`: `aws-cli/1.15.40 Python/3.6.5 Darwin/17.7.0 botocore/1.10.40`) – MichaelChirico Aug 03 '18 at 06:07
  • 2
    this answer has also the advantage that the file content will be stream to your terminal, and not copied as a whole. see more at https://loige.co/aws-command-line-s3-content-from-stdin-or-to-stdout/#pipeline-processing-of-s3-files – Khoa Jan 29 '19 at 06:25
76

If you are using a version of the AWS CLI that doesn't support copying to "-" you can also use /dev/stdout:

$ aws s3 cp --quiet s3://mybucket/stream.txt /dev/stdout

You also may want the --quiet flag to prevent a summary line like the following from being appended to your output:

download: s3://mybucket/stream.txt to ../../dev/stdout

Drew
  • 1,059
  • 9
  • 9
6

You can try using s3streamcat, it supports bzip, gzip and xz formats as well.

Install with

sudo pip install s3streamcat

Usage:

s3streamcat s3://bucketname/dir/file_path
s3streamcat s3://bucketname/dir/file_path | more
s3streamcat s3://bucketname/dir/file_path | grep something
Scott Stensland
  • 26,870
  • 12
  • 93
  • 104
samarth
  • 3,866
  • 7
  • 45
  • 60
-4

If you wish to accomplish this using BASH, you'll have to call-out to an external app such as the AWS Command-Line Interface (CLI). It does not have a CAT equivalent, so you would need to copy the file locally and then CAT it.

Alternatively, you could use/write an app that directly calls the AWS SDK, which is available for languages such as Python, PHP, Java. By using the SDK, file contents can be retrieved in-memory and then sent to stdout.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • The above answer lists that you can use 'cp' with '-' as the 2nd file argument to make it output the file to stdout. – Asfand Qazi Jan 14 '16 at 16:29
-5

Ah ha!

https://pypi.python.org/pypi/s3cat/1.0.8

I'm writing more characters to satisfy the length requirement.

Neil C. Obremski
  • 18,696
  • 24
  • 83
  • 112