56

On Linux, we generally use the head/tail commands to preview the contents of a file. It helps in viewing a part of the file (to inspect the format for instance), rather than open up the whole file.

In the case of Amazon S3, it seems that there are only ls, cp, mv etc. commands I wanted to know if it is possible to view part of the file without downloading the entire file on my local machine using cp/GET.

nutsiepully
  • 1,066
  • 3
  • 14
  • 21

8 Answers8

115

One thing you could do is cp the object to stout and then pipe it to head:

aws s3 cp s3://path/to/my/object - | head

You get a broken pipe error at the end but it works.

dspringate
  • 1,805
  • 2
  • 13
  • 20
12

You can specify a byte range when retrieving data from S3 to get the first N bytes, the last N bytes or anything in between. (This is also helpful since it allows you to download files in parallel – just start multiple threads or processes, each of which retrieves part of the total file.)

I don't know which of the various CLI tools support this directly but a range retrieval does what you want.

The AWS CLI tools ("aws s3 cp" to be precise) does not allow you to do range retrieval but s3curl (http://aws.amazon.com/code/128) should do the trick.(So does plain curl, e.g., using the --range parameter but then you would have to do the request signing on your own.)

12

You can use the range switch to the older s3api get-object command to bring back the first bytes of a s3 object. (AFAICT s3 doesn't support the switch.)

The pipe \dev\stdout can be passed as the target filename if you simply want to view the S3 object by piping to head. Here's an example:

aws s3api get-object --bucket mybucket_name --key path/to/the/file.log --range bytes=0-10000 /dev/stdout | head

Finally, if like me you're dealing with compressed .gz files, the above technique also works with zless enabling you to view the head of the decompressed file:

aws s3api get-object --bucket mybucket_name --key path/to/the/file.log.gz --range bytes=0-10000 /dev/stdout | zless

One tip with zless: if it isn't working try increasing the size of the range.

Ben Hutchison
  • 2,433
  • 2
  • 21
  • 25
10

If you don't want to download the whole file, you can download a portion of it with the --range option specified in the aws s3api command and after the file portion is downloaded, then run a head command on that file.

Example:

aws s3api get-object --bucket my_s3_bucket --key s3_folder/file.txt --range bytes=0-1000000 tmp_file.txt && head tmp_file.txt

Explanation:

The aws s3api get-object downloads a portion of the s3 file from the specified bucket and s3 folder with the a specified size in --range to a specified output file. The && executes the second command only if the first one has succeeded. The second command prints the 10 first line of the previously created output file.

Diligent Key Presser
  • 4,183
  • 4
  • 26
  • 34
Yaniv
  • 157
  • 1
  • 11
6

If you are using s3cmd, you can use the s3cmd get and write to stdout and pipe it to head as follows:

s3cmd get s3://bucket/file - | head

If you want to view the head of a gzip file, pipe stdout to gzip -d - and to head:

s3cmd get s3://bucket/file - | gzip -d - | head

If you get bored with this piping business, add the following script to your ~/.bashrc

function s3head {
    s3_path=${@:$#}
    params=${@:1:$# - 1}
    s3cmd get $s3_path - | zcat -f | head $params
}

Now source the ~/.bashrc file.

Simply running s3head s3://bucket/file will give you the first 10 line of your file.

This even supports other head command parameters.

For example, If you want more line, just specify -n followed by the number of lines as follows:

# Prints the first 14 lines of s3://bucket/file
s3head -n 14 s3://bucket/file

Here are some other utility scripts for s3: https://github.com/aswathkk/dotfiles/blob/master/util_scripts/s3utils.sh

Aswath K
  • 365
  • 3
  • 11
4

As others have answered, assuming the file is large, use get-object command with --range bytes=0-1000 to download only part of the file.

example:
aws s3api get-object --profile opsrep --region eu-west-1 --bucket <MY-BUCKET> --key <DIR/MY-FILE.CSV> --range bytes=0-10000 "OUTPUT.csv" docs

As of 2018 you can now run SELECT Queries in AWS CLI. Use LIMIT 10 to preview the "head" of your file.

example:
aws s3api select-object-content --bucket <MY-BUCKET> --key <DIR/MY-FILE.CSV> --expression "select * from s3object limit 10" --expression-type "SQL" --input-serialization "CSV={}" --output-serialization "CSV={}" "OUTPUT.csv" docs

Now you can quickly run head OUTPUT.csv on the small local file

knanne
  • 608
  • 5
  • 11
2

One easy way to do is :-

aws s3api get-object --bucket bucket_name --key path/to/file.txt  --range bytes=0-10000 /path/to/local/t3.txt | cat t3 | head -100

For the gz file , you can do

aws s3api get-object --bucket bucket_name --key path/to/file.gz  --range bytes=0-10000 /path/to/local/t3 | zless t3 | head -100

If the data is being less, incerease the amount of bytes required

Aklank Jain
  • 1,002
  • 1
  • 13
  • 21
0

There is no such capability. You can only retrieve the entire object. You can perform an HTTP HEAD request to view object metadata, but that isn't what you're looking for.

Ben Whaley
  • 32,811
  • 7
  • 87
  • 85