51

I used to be a happy s3cmd user. However recently when I try to transfer a large zip file (~7Gig) to Amazon S3, I am getting this error:

$> s3cmd put thefile.tgz s3://thebucket/thefile.tgz

....
  20480 of 7563176329     0% in    1s    14.97 kB/s  failed
WARNING: Upload failed: /thefile.tgz ([Errno 32] Broken pipe)
WARNING: Retrying on lower speed (throttle=1.25)
WARNING: Waiting 15 sec...
thefile.tgz -> s3://thebucket/thefile.tgz  [1 of 1]
       8192 of 7563176329     0% in    1s     5.57 kB/s  failed
ERROR: Upload of 'thefile.tgz' failed too many times. Skipping that file.

I am using the latest s3cmd on Ubuntu.

Why is it so? and how can I solve it? If it is unresolvable, what alternative tool can I use?

qliq
  • 11,695
  • 15
  • 54
  • 66

15 Answers15

57

And now in 2014, the aws cli has the ability to upload big files in lieu of s3cmd.

http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-set-up.html has install / configure instructions, or often:

$ wget https://s3.amazonaws.com/aws-cli/awscli-bundle.zip
$ unzip awscli-bundle.zip
$ sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
$ aws configure

followed by

$ aws s3 cp local_file.tgz s3://thereoncewasans3bucket

will get you satisfactory results.

user116293
  • 5,534
  • 4
  • 25
  • 17
  • +1 ! I have a 110GB file I needed to backup on a consistent basis, doing it in parts is terrible. The above solution is great! – Geesu Sep 17 '14 at 15:18
  • I just spent about an hour on chat with AWS support and they actually chatted me this SO article! Even though my files were < 100Mb and this error just came out of nowhere... Installing AWS CLI and switching to that solved the problem. – Dave Collins Apr 05 '15 at 04:56
28

I've just come across this problem myself. I've got a 24GB .tar.gz file to put into S3.

Uploading smaller pieces will help.

There is also ~5GB file size limit, and so I'm splitting the file into pieces, that can be re-assembled when the pieces are downloaded later.

split -b100m ../input-24GB-file.tar.gz input-24GB-file.tar.gz-

The last part of that line is a 'prefix'. Split will append 'aa', 'ab', 'ac', etc to it. The -b100m means 100MB chunks. A 24GB file will end up with about 240 100mb parts, called 'input-24GB-file.tar.gz-aa' to 'input-24GB-file.tar.gz-jf'.

To combine them later, download them all into a directory and:

cat input-24GB-file.tar.gz-* > input-24GB-file.tar.gz

Taking md5sums of the original and split files and storing that in the S3 bucket, or better, if its not so big, using a system like parchive to be able to check, even fix some download problems could also be valuable.

Alister Bulman
  • 34,482
  • 9
  • 71
  • 110
  • 1
    Thanks Alister. I didn't know of ~5Gig file size limit. So no problem with s3cmd :) – qliq Apr 29 '11 at 02:32
  • 4
    I believe it is a limitation of s3cmd, as Amazon has allows files of several terabytes. – philfreo May 06 '11 at 01:45
  • The file being so big may be one reason. But I experienced the problem with files as small as 100MB. – qliq Nov 09 '11 at 04:32
  • It's all about the network. On AWS there are usually less problems, but outside of the local network, all bets are off. You may want to split files up even smaller. – Alister Bulman Nov 09 '11 at 16:22
  • split -b5G worked for me. Tried -b10G but failed. 5Giga then. – Xavi Montero Jan 10 '13 at 10:05
  • 4
    As of right now, S3 accepts files up to 5 TB, but can only accept single uploads up to 5 GB. Larger requires multi-part upload. http://aws.amazon.com/s3/faqs/#How_much_data_can_I_store – Leopd Aug 30 '13 at 14:49
16

I tried all of the other answers but none worked. It looks like s3cmd is fairly sensitive. In my case the s3 bucket was in the EU. Small files would upload but when it got to ~60k it always failed.

When I changed ~/.s3cfg it worked.

Here are the changes I made:

host_base = s3-eu-west-1.amazonaws.com

host_bucket = %(bucket)s.s3-eu-west-1.amazonaws.com

Ger Hartnett
  • 446
  • 3
  • 4
10

I had the same problem with ubuntu s3cmd.

s3cmd --guess-mime-type --acl-public put test.zip s3://www.jaumebarcelo.info/teaching/lxs/test.zip
test.zip -> s3://www.jaumebarcelo.info/teaching/lxs/test.zip  [1 of 1]
 13037568 of 14456364    90% in  730s    17.44 kB/s  failed
WARNING: Upload failed: /teaching/lxs/test.zip (timed out)
WARNING: Retrying on lower speed (throttle=0.00)
WARNING: Waiting 3 sec...
test.zip -> s3://www.jaumebarcelo.info/teaching/lxs/test.zip  [1 of 1]
  2916352 of 14456364    20% in  182s    15.64 kB/s  failed
WARNING: Upload failed: /teaching/lxs/test.zip (timed out)
WARNING: Retrying on lower speed (throttle=0.01)
WARNING: Waiting 6 sec...

The solution was to update s3cmd with the instructions from s3tools.org:

Debian & Ubuntu

Our DEB repository has been carefully created in the most compatible way – it should work for Debian 5 (Lenny), Debian 6 (Squeeze), Ubuntu 10.04 LTS (Lucid Lynx) and for all newer and possibly for some older Ubuntu releases. Follow these steps from the command line:

  • Import S3tools signing key:

    wget -O- -q http://s3tools.org/repo/deb-all/stable/s3tools.key | sudo apt-key add -

  • Add the repo to sources.list:

    sudo wget -O/etc/apt/sources.list.d/s3tools.list http://s3tools.org/repo/deb-all/stable/s3tools.list

  • Refresh package cache and install the newest s3cmd:

    sudo apt-get update && sudo apt-get install s3cmd

plainjimbo
  • 7,070
  • 9
  • 41
  • 55
Jaume Barcelo
  • 101
  • 1
  • 2
  • 2
    copy the contents of the link here, leave the link as reference. – Inbar Rose Oct 21 '12 at 08:50
  • I've tried to update as of the original page instructions, but still with a 24GB file fails, while a 1GB file works. Trying other solutions. – Xavi Montero Jan 10 '13 at 07:35
  • If that doesn't works install from the tar packages. http://sourceforge.net/projects/s3tools/files/s3cmd/1.1.0-beta2/s3cmd-1.1.0-beta2.tar.gz/download – Elmer Apr 22 '13 at 03:26
  • 1
    Indeed, it didn't work for me. It updated to 1.0.x but had the same issue. As @user1681360 suggested, building the tarball (v 1.5.x) fixed the issue (it uploaded using multi-part). – DavidJ May 02 '13 at 20:55
  • I had this problem uploading a 38MB file because I was using a t1.micro instance with limited bandwidth - changing to an m1-medium instance solved the problem. – devstopfix Apr 07 '14 at 10:22
6

This error occurs when Amazon returns an error: they seem to then disconnect the socket to keep you from uploading gigabytes of request to get back "no, that failed" in response. This is why for some people are getting it due to clock skew, some people are getting it due to policy errors, and others are running into size limitations requiring the use of the multi-part upload API. It isn't that everyone is wrong, or are even looking at different problems: these are all different symptoms of the same underlying behavior in s3cmd.

As most error conditions are going to be deterministic, s3cmd's behavior of throwing away the error message and retrying slower is kind of crazy unfortunate :(. Itthen To get the actual error message, you can go into /usr/share/s3cmd/S3/S3.py (remembering to delete the corresponding .pyc so the changes are used) and add a print e in the send_file function's except Exception, e: block.

In my case, I was trying to set the Content-Type of the uploaded file to "application/x-debian-package". Apparently, s3cmd's S3.object_put 1) does not honor a Content-Type passed via --add-header and yet 2) fails to overwrite the Content-Type added via --add-header as it stores headers in a dictionary with case-sensitive keys. The result is that it does a signature calculation using its value of "content-type" and then ends up (at least with many requests; this might be based on some kind of hash ordering somewhere) sending "Content-Type" to Amazon, leading to the signature error.

In my specific case today, it seems like -M would cause s3cmd to guess the right Content-Type, but it seems to do that based on filename alone... I would have hoped that it would use the mimemagic database based on the contents of the file. Honestly, though: s3cmd doesn't even manage to return a failed shell exit status when it fails to upload the file, so combined with all of these other issues it is probably better to just write your own one-off tool to do the one thing you need... it is almost certain that in the end it will save you time when you get bitten by some corner-case of this tool :(.

Jay Freeman -saurik-
  • 1,759
  • 1
  • 13
  • 13
  • Thank you for making clear that s3cmd isn't as good as his popularity made me believe. Using `aws s3 cp ` now. – tobltobs Jan 22 '16 at 17:21
5

s3cmd 1.0.0 does not support multi-part yet. I tried 1.1.0-beta and it works just fine. You can read about the new features here: http://s3tools.org/s3cmd-110b2-released

Josh Gagnon
  • 5,342
  • 3
  • 26
  • 36
Jirapong
  • 24,074
  • 10
  • 54
  • 72
  • 1
    I wish I could upvote this more: it's the simplest solution to the problem described by Alister Bulman (not the problems described by Jaume Barcelo, qliq, or others). `s3cmd-1.1.0-betaX` (`beta3` at the time of writing) not only does the splitting and uploading for you, but it asks Amazon to re-combine the files so that they appear as one file on S3. *THIS IS ESSENTIAL* if you're going to use it in Elastic Map-Reduce, where you don't have the option to recombine them by hand with `cat`. – Jim Pivarski Dec 04 '13 at 19:57
4

I experienced the same issue, it turned out to be a bad bucket_location value in ~/.s3cfg.

This blog post lead me to the answer.

If the bucket you’re uploading to doesn’t exist (or you miss typed it ) it’ll fail with that error. Thank you generic error message. - See more at: http://jeremyshapiro.com/blog/2011/02/errno-32-broken-pipe-in-s3cmd/#sthash.ZbGwj5Ex.dpuf

After inspecting my ~/.s3cfg is saw that it had:

bucket_location = Sydney

Rather than:

bucket_location = ap-southeast-2

Correcting this value to use the proper name(s) solved the issue.

Nick Breen
  • 359
  • 2
  • 9
4

In my case the reason of the failure was the server's time being ahead of the S3 time. Since I used GMT+4 in my server (located in US East) and I was using Amazon's US East storage facility.

After adjusting my server to the US East time, the problem was gone.

qliq
  • 11,695
  • 15
  • 54
  • 66
2

For me, the following worked:

In .s3cfg, I changed the host_bucket

host_bucket = %(bucket)s.s3-external-3.amazonaws.com
1

s3cmd version 1.1.0-beta3 or better will automatically use multipart uploads to allow sending up arbitrarily large files (source). You can control the chunk size it uses, too. e.g.

s3cmd --multipart-chunk-size-mb=1000 put hugefile.tar.gz s3://mybucket/dir/

This will do the upload in 1 GB chunks.

overthink
  • 23,985
  • 4
  • 69
  • 69
0

I encountered the same broken pipe error as the security group policy was set wrongly.. I blame S3 documentation.

I wrote about how to set the policy correctly in my blog, which is:

{
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads"
      ],
      "Resource": "arn:aws:s3:::example_bucket",
      "Condition": {}
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:AbortMultipartUpload",
        "s3:DeleteObject",
        "s3:DeleteObjectVersion",
        "s3:GetObject",
        "s3:GetObjectAcl",
        "s3:GetObjectVersion",
        "s3:GetObjectVersionAcl",
        "s3:PutObject",
        "s3:PutObjectAcl",
        "s3:PutObjectAclVersion"
      ],
      "Resource": "arn:aws:s3:::example_bucket/*",
      "Condition": {}
    }
  ]
}
samwize
  • 25,675
  • 15
  • 141
  • 186
0

On my case, I've fixed this just adding right permissions.

Bucket > Properties > Permissions 
"Authenticated Users"
- List
- Upload/Delete
- Edit Permissions
Ignacio Pascual
  • 1,895
  • 1
  • 21
  • 18
0

I encountered a similar error which eventually turned out to be caused by a time drift on the machine. Correctly setting the time fixed the issue for me.

yoniLavi
  • 2,624
  • 1
  • 24
  • 30
0

Search for .s3cfg file, generally in your Home Folder.

If you have it, you got the villain. Changing the following two parameters should help you.

socket_timeout = 1000
multipart_chunk_size_mb = 15
Kaey
  • 4,615
  • 1
  • 14
  • 18
-1

I addressed this by simply not using s3cmd. Instead, I've had great success with the python project, S3-Multipart on GitHub. It does uploading and downloading, along with using as many threads as desired.

Dolan Antenucci
  • 15,432
  • 17
  • 74
  • 100
  • Not sure why I got downvoted -- real productive to not comment -- but I will note that I stopped using this project, which may have given me some corrupted data at one point, and I just use the AWS CLI exclusively. – Dolan Antenucci Apr 29 '16 at 12:42