4

I want to be able to sync git repositories to AWS S3 for backups. Furthermore I want the public to be able to git clone my backups. My steps were:

s3cmd mb s3://lktesting
git update-server-info
s3cmd -P sync .git/ s3://lktesting
s3cmd ws-create s3://lktesting
s3cmd ws-info s3://lktesting

I thought this used to work, but now I get:

git clone http://lktesting.s3-website-ap-southeast-1.amazonaws.com/
Cloning into 'lktesting.s3-website-ap-southeast-1.amazonaws.com'...
error: The requested URL returned error: 403 Forbidden (curl_result = 22, http_code = 403, sha1 = bf866b95d9517ea38e213740cead5cf1c313f5aa)
Checking connectivity... done.

Does anyone know what I am missing?

hendry
  • 9,725
  • 18
  • 81
  • 139

3 Answers3

5

If you want to avoid any sync issue (like a .git/objects/... missing), do not sync the content of .git

Use a git bundle in order to copy only one file representing the compressed version of your git repository (see "How can I email someone a git repository?").
That one file acts as a full-fledged git repo: you can git clone from it.

cd /path/to/your/repo
git bundle create /tmp/myrepo.bundle --all
s3cmd -P sync /tmp/myrepo.bundle s3://lktesting
git clone http://lktesting.s3-website-ap-southeast-1.amazonaws.com/myrepo.bundle

You can not push to it though, so you might want to clone it direcly in your s3 instance, and clone from that uncompressed s3 repo.

Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
5

Git objects under .git may exists as single files or inside git packs. The Git dumb HTTP protocol will try to fetch an object as a single file, and only if this fails with "404 Not found", it will look for a pack.

Apparently, an Amazon S3 bucket will only return the 404 code if you give the "List" permission to everyone: How do I have an S3 bucket return 404 (instead of 403) for a key that does not exist in the bucket/

Update: You can assign the necessary permission using AWS CLI with put-bucket-acl from s3api.

Complete sequence of commands to host a clonable git repository in an S3 bucket:

BUCKET=my-bucket-name

# Setup
aws s3 mb s3://$BUCKET
aws s3api put-bucket-acl --bucket $BUCKET --acl public-read

# Sync
git update-server-info
aws s3 sync --acl public-read .git s3://$BUCKET

# Clone
git clone https://$BUCKET.s3.amazonaws.com
Community
  • 1
  • 1
Bruno De Fraine
  • 45,466
  • 8
  • 54
  • 65
  • 1
    I'll accept your answer if you explicitly tell how to add read permissions, e.g. `aws s3api put-bucket-acl --bucket $BUCKET --acl public-read` – hendry Jan 06 '16 at 05:30
1

It looks like running the exact same approach with an empty repository works okay.

Running the same command (git clone) with debug flags[0], while I get some contents copied locally, it looks like certain objects[1] referenced in the git repo aren't present in the S3 bucket (403 is the default response code thrown when a key isn't present). Did your sync complete fully?

[0]

GIT_CURL_VERBOSE=1 GIT_TRACE=1 git clone http://lktesting.s3-website-ap-southeast-1.amazonaws.com/
[...]
GET /objects/03/4261c96d614614344a1b618c8ec3d8d2ff7d3c HTTP/1.1
Host: lktesting.s3-website-ap-southeast-1.amazonaws.com
User-Agent: git/2.5.4 (Apple Git-61)
Accept: */*

* The requested URL returned error: 403 Forbidden

[1] /objects/03/4261c96d614614344a1b618c8ec3d8d2ff7d3c

alexjs
  • 553
  • 2
  • 10
  • It would appear objects/bf/866b95d9517ea38e213740cead5cf1c313f5aa does not exist in my local or remote copy. Wonder why git is looking for it. – hendry Jan 01 '16 at 11:41
  • It looks like it's referenced in objects/pack/pack-c09c1942e51effe9e1ce1106a8f1f57f845b0dee.idx -- but I don't know enough about Git to say why. In this case, I imagine S3's default behaviour of throwing a 403 instead of a 404 for an invalid object is causing Git's unexpected response. The clone does at least give me a repository output into `lktesting.s3-website-ap-southeast-1.amazonaws.com/`, but I don't know whether there are objects missing. – alexjs Jan 01 '16 at 11:43