"head" command for aws s3 to view file contents

Question

On Linux, we generally use the head/tail commands to preview the contents of a file. It helps in viewing a part of the file (to inspect the format for instance), rather than open up the whole file.

In the case of Amazon S3, it seems that there are only ls, cp, mv etc. commands I wanted to know if it is possible to view part of the file without downloading the entire file on my local machine using cp/GET.

score 115 · Answer 1 · answered May 03 '16 at 07:22

115

One thing you could do is cp the object to stout and then pipe it to head:

aws s3 cp s3://path/to/my/object - | head

You get a broken pipe error at the end but it works.

answered May 03 '16 at 07:22

dspringate

1,805
2
13
20

This works as expected, but when I put this into a variable it adds a ^M in the last, did you get this ? – Barath Ravichander Dec 24 '19 at 08:06
3

[combining this with Aswath K's solution] For gzip files: `aws s3 cp s3://path/to/my/object - | gzip -d - | head -5` – leerssej Apr 09 '20 at 00:38
1

I was surprised, it outputs the head very fast even on a large (many GB) file! – Nic Scozzaro Nov 11 '20 at 18:49

score 12 · Accepted Answer · answered Sep 22 '14 at 23:06

You can specify a byte range when retrieving data from S3 to get the first N bytes, the last N bytes or anything in between. (This is also helpful since it allows you to download files in parallel – just start multiple threads or processes, each of which retrieves part of the total file.)

I don't know which of the various CLI tools support this directly but a range retrieval does what you want.

The AWS CLI tools ("aws s3 cp" to be precise) does not allow you to do range retrieval but s3curl (http://aws.amazon.com/code/128) should do the trick.(So does plain curl, e.g., using the --range parameter but then you would have to do the request signing on your own.)

See http://docs.aws.amazon.com/AmazonS3/latest/dev/GettingObjectsUsingAPIs.html for how to implement this yourself using various AWS SDKs. — Michael Hanisch, Sep 22 '14 at 23:07
Thanks. I'll just use s3curl. The use case seems common enough to be supported though. — nutsiepully, Sep 22 '14 at 23:15

Ben Hutchison · Answer 3 · 2018-02-20T07:06:51.780

You can use the range switch to the older s3api get-object command to bring back the first bytes of a s3 object. (AFAICT s3 doesn't support the switch.)

The pipe \dev\stdout can be passed as the target filename if you simply want to view the S3 object by piping to head. Here's an example:

aws s3api get-object --bucket mybucket_name --key path/to/the/file.log --range bytes=0-10000 /dev/stdout | head

Finally, if like me you're dealing with compressed .gz files, the above technique also works with zless enabling you to view the head of the decompressed file:

aws s3api get-object --bucket mybucket_name --key path/to/the/file.log.gz --range bytes=0-10000 /dev/stdout | zless

One tip with zless: if it isn't working try increasing the size of the range.

score 10 · Answer 4 · edited Jul 21 '17 at 03:07

If you don't want to download the whole file, you can download a portion of it with the --range option specified in the aws s3api command and after the file portion is downloaded, then run a head command on that file.

Example:

aws s3api get-object --bucket my_s3_bucket --key s3_folder/file.txt --range bytes=0-1000000 tmp_file.txt && head tmp_file.txt

Explanation:

The aws s3api get-object downloads a portion of the s3 file from the specified bucket and s3 folder with the a specified size in --range to a specified output file. The && executes the second command only if the first one has succeeded. The second command prints the 10 first line of the previously created output file.

Aswath K · Answer 5 · 2020-04-12T06:04:42.717

If you are using s3cmd, you can use the s3cmd get and write to stdout and pipe it to head as follows:

s3cmd get s3://bucket/file - | head

If you want to view the head of a gzip file, pipe stdout to gzip -d - and to head:

s3cmd get s3://bucket/file - | gzip -d - | head

If you get bored with this piping business, add the following script to your ~/.bashrc

function s3head {
    s3_path=${@:$#}
    params=${@:1:$# - 1}
    s3cmd get $s3_path - | zcat -f | head $params
}

Now source the ~/.bashrc file.

Simply running s3head s3://bucket/file will give you the first 10 line of your file.

This even supports other head command parameters.

For example, If you want more line, just specify -n followed by the number of lines as follows:

# Prints the first 14 lines of s3://bucket/file
s3head -n 14 s3://bucket/file

Here are some other utility scripts for s3: https://github.com/aswathkk/dotfiles/blob/master/util_scripts/s3utils.sh

knanne · Answer 6 · 2020-02-26T14:34:33.463

As others have answered, assuming the file is large, use get-object command with --range bytes=0-1000 to download only part of the file.

example:
aws s3api get-object --profile opsrep --region eu-west-1 --bucket <MY-BUCKET> --key <DIR/MY-FILE.CSV> --range bytes=0-10000 "OUTPUT.csv" docs

As of 2018 you can now run SELECT Queries in AWS CLI. Use LIMIT 10 to preview the "head" of your file.

example:
aws s3api select-object-content --bucket <MY-BUCKET> --key <DIR/MY-FILE.CSV> --expression "select * from s3object limit 10" --expression-type "SQL" --input-serialization "CSV={}" --output-serialization "CSV={}" "OUTPUT.csv" docs

Now you can quickly run head OUTPUT.csv on the small local file

Aklank Jain · Answer 7 · 2018-04-17T09:02:56.613

2

One easy way to do is :-

aws s3api get-object --bucket bucket_name --key path/to/file.txt  --range bytes=0-10000 /path/to/local/t3.txt | cat t3 | head -100

For the gz file , you can do

aws s3api get-object --bucket bucket_name --key path/to/file.gz  --range bytes=0-10000 /path/to/local/t3 | zless t3 | head -100

If the data is being less, incerease the amount of bytes required

edited Apr 17 '18 at 09:02

answered Apr 16 '18 at 11:03

Aklank Jain

1,002
1
13
21

score 0 · Answer 8 · answered Sep 22 '14 at 22:57

0

There is no such capability. You can only retrieve the entire object. You can perform an HTTP HEAD request to view object metadata, but that isn't what you're looking for.

answered Sep 22 '14 at 22:57

Ben Whaley

32,811
7
87
85

Thanks. That doesnt make sense though. Or i guess it just helps their network billing. – nutsiepully Sep 22 '14 at 22:59
@nutsiepully Michael Hanisch's answer is better than mine since it offers a solution, you should accept his instead! – Ben Whaley Sep 23 '14 at 00:09
Thanks for being so graceful. I thought of doing so, but wasn't sure :-) – nutsiepully Sep 23 '14 at 01:05

"head" command for aws s3 to view file contents

8 Answers8

Linked