97

Does anyone know how to do this? So far I haven't been able to find anything useful via Google.

I'd really like to setup a local repo and use git push to publish it to S3, the idea being to have local version control over assets but remote storage on S3.

Can this be done, and if so, how?

Andrew
  • 42,517
  • 51
  • 181
  • 281
  • 1
    OP, Currently the accepted answer does not apply to the question posed. Would it be possible to update for the greater good? I believe `s3fs` to be a viable solution. – bnjmn Nov 02 '12 at 15:09
  • 1
    @Benjamin Thanks for calling my attention back to this one, it's updated. – Andrew Nov 03 '12 at 00:34
  • 1
    Wouldn't it make sense to deploy to any remote repository (to preserve a backup history of commits) and use a [git hook](https://git-scm.com/docs/githooks) to simply sync with an S3 bucket (e.g.: `s3cmd sync …`)? – Fabien Snauwaert Aug 24 '17 at 15:41
  • Here are the steps: https://metamug.com/article/jgit-host-git-repository-on-s3.php – Sorter May 28 '18 at 08:27

10 Answers10

53

1 Use JGit via http://blog.spearce.org/2008/07/using-jgit-to-publish-on-amazon-s3.html

Download jgit.sh, rename it to jgit and put it in your path (for example $HOME/bin).

Setup the .jgit config file and add the following (substituting your AWS keys):

$vim ~/.jgit

accesskey: aws access key
secretkey: aws secret access key

Note, by not specifying acl: public in the .jgit file, the git files on S3 will be private (which is what we wanted). Next create an S3 bucket to store your repository in, let’s call it git-repos, and then create a git repository to upload:

s3cmd mb s3://git-repos
mkdir chef-recipes
cd chef-recipes
git init
touch README
git add README
git commit README
git remote add origin amazon-s3://.jgit@git-repos/chef-recipes.git

In the above I’m using the s3cmd command line tool to create the bucket but you can do it via the Amazon web interface as well. Now let’s push it up to S3 (notice how we use jgit whenever we interact with S3, and standard git otherwise):

jgit push origin master

Now go somewhere else (e.g. cd /tmp) and try cloning it:

jgit clone amazon-s3://.jgit@git-repos/chef-recipes.git

When it comes time to update it (because jgit doesn’t support merge or pull) you do it in 2 steps:

cd chef-recipes
jgit fetch
git merge origin/master

2 Use FUSE-based file system backed by Amazon S3

  1. Get an Amazon S3 account!

  2. Download, compile and install. (see InstallationNotes)

  3. Specify your Security Credentials (Access Key ID & Secret Access Key) by one of the following methods:

    • using the passwd_file command line option

    • setting the AWSACCESSKEYID and AWSSECRETACCESSKEY environment variables

    • using a .passwd-s3fs file in your home directory

    • using the system-wide /etc/passwd-s3fs file

    • do this

.

/usr/bin/s3fs mybucket /mnt

That's it! the contents of your amazon bucket "mybucket" should now be accessible read/write in /mnt

Bruno Bronosky
  • 66,273
  • 12
  • 162
  • 149
Riceball LEE
  • 1,541
  • 18
  • 18
  • 1
    I followed these directions but I get a `The request signature we calculated does not match the signature you provided. Check your key and signing method` error when I try to `jgit push origin master`. Any idea how I can make that go away? – john Feb 06 '12 at 21:20
  • 5
    Why wouldn't you just use git? This seems like a lot of extra work/stuff just for a simple remote git repo on aws... – cmcculloh Apr 24 '12 at 18:31
  • 1
    I would suggest updating this (real) answer to go one step further with a `post-receive` hook that checks out the `GIT_WORK_TREE`. See [here](http://toroid.org/ams/git-website-howto) for more details. I ended up getting this to work quite well with `s3fs`. Highly recommended and thanks for helping me get started. – bnjmn Nov 02 '12 at 15:00
  • 2
    @cmculloh if you don't already have an EC2 instance then that can be more trouble and cost long-term just to have a git repo. Also S3 storage is much more durable by default; to get the same durability on EC2 you'd have to be backing snapshots up to S3 anyway – Jeremy Mar 23 '13 at 15:01
  • 1
    It seems the blog post has been updated to make the jgit project page to link to egit [http://www.eclipse.org/egit/] project's repository and that's make whole of the solution 1 confusing to go with. After a little bit search I could find the original jgit project page from where jgit.sh can be downloaded and used. The link is http://www.eclipse.org/jgit/download/ for anyone who may need it in future. – M N Islam Shihan Dec 06 '13 at 19:18
  • Is there any option to use Jgit with IAM Roles (Besides pulling the creds externally)? – Yaron Jun 16 '14 at 08:47
11

Dandelion is another CLI tool that will keep Git repositories in sync with S3/FTP/SFTP: http://github.com/scttnlsn/dandelion

scttnlsn
  • 2,976
  • 1
  • 33
  • 39
  • No idea who downvoted this. I think this solution is good if the goal is to have incremental deploys alongside git for local version management and S3 for hosting. That's what seems to be the question. – Tabrez Jan 02 '14 at 18:51
10

git-s3 - https://github.com/schickling/git-s3

You just have to run git-s3 deploy

It comes with all benefits of a git repo and uploades/deletes just the files you've changed.
Note: Deploys aren't implicit via git push but you could achieve that via a git hook.

schickling
  • 4,000
  • 4
  • 29
  • 33
10

You can also do this using the AWS CLI and Git (with hooks). Verified working on Windows 10. Should work on Linux/Mac.

Setup Sync to S3 on commit

  1. Install AWS CLI
  2. Setup IAM programmatic access credentials (you can limit to S3 and even down to just the bucket).
  3. Configure AWS CLI with the credentials
  4. Create the S3 bucket in AWS console, or on the CLI.
  5. Ensure the bucket is private.
  6. Make a new bare git repo of your existing git project:
mkdir myproject.git
cd myproject.git
git init --bare

NOTE: Using a bare repo will serve as the upstream and the bare repo will only have the changes that you want to upload to the S3 bucket and not ignored files, local git configurations, etc.

  1. Install this hook as post-update into hooks of the bare myproject.git directory.

    #!/bin/sh; C:/Program\ Files/Git/usr/bin/sh.exe
    # Sync the contents of the bare repo to an S3 bucket.
    aws s3 sync . s3://myproject/ --delete
    

    Note: Add the --delete option to make sure files that are deleted locally are deleted from the bucket. And using --exact-timestamps option can optimize uploading.

    --exact-timestamps (boolean) When syncing from S3 to local, same-sized items will be ignored only when the timestamps match exactly. The default behavior is to ignore same-sized items unless the local version is newer than the S3 version.

    --delete (boolean) Files that exist in the destination but not in the source are deleted during sync.

    See aws sync for more details and options.

  2. Update the hook with the correct S3 bucket name.

  3. Now cd into your myproject directory and add the bare repo as an upstream, name it s3 for example:

git remote add s3 path/to/bare/directory/myproject.git 

NOTE: You can use a relative path for the path to the bare directory.

Testing

  1. Add changes to your repo and commit.
  2. Push your changes to s3 upstream when you want to sync changes to your S3 bucket.
  3. You should see the changes sync to the S3 bucket you specified, you can also view the changes in the S3 bucket to verify everything worked.

References:

b01
  • 4,076
  • 2
  • 30
  • 30
  • This is OK if you already have AWS CLI and Git already installed. It assumes Git knowledge of hooks, that you know how to setup IAM users and programmatic credentials and make S3 buckets. – b01 Dec 17 '19 at 03:36
  • So do I understand correctly that this has two repos locally and the bare local repo is setup to sync itself to s3 via post-update? – Josiah Jul 16 '20 at 14:17
  • @Josiah Yes that is correct. While it is not necessary, it helps to simplify sync'ing only the files you want in S3, since the bare repo will only contain files that are committed. I tried this without the bare repo, but then I had to exclude some files from sync'ing to S3, noticably anything in the .gitignore file. – b01 Jul 18 '20 at 19:07
  • @PhilipCouling There is no claim that the AWS CLI sync was equivalent to git here. If there is please point it out and/or correct it. Git is responsible for keeping the bare repository up-to-date. While the S3 sync is responsible for making sure the files stored in S3 match the local files in bare repository. Unfortunately keeping the times and dates in of the files in sync may not be possible with how S3 works. See this issue about how `--exact-timestamps` works [Retain original modified date time #6601](https://github.com/aws/aws-cli/issues/6601#issuecomment-990137502) – b01 Jan 07 '23 at 19:32
  • @PhilipCouling I believe this answers the OPs question, which I also believe to be good working solution to storing files in S3 when running `git push`. As long as Git does not get thrown off by dates in S3, then it will be a great option to storing your Git repos in S3 without corruption. If you have proof otherwise can you please provide an example? I'm sure we can perform some kind of work-a-round or fix to address your concern. Again, I'm just asking for more clarity around your concern. – b01 Jan 07 '23 at 19:40
  • @PhilipCouling "there are multiple local git copies syncing to S3 this way" How so? – b01 Jan 08 '23 at 21:13
  • @PhilipCouling All you've done is made a statement with nothing to back it up. You are also implying that having Git update a bare repository and using `aws s3 sync` to upload files to S3 for backup will somehow corrupt something (I have to assume you mean the repository itself or some of the files, you're not being clear). There is harm is asking for proof of this. – b01 Jan 08 '23 at 22:41
  • When you do a git push to the bare repo, it will fail or succeed. On success, it will upload changes to the S3 bucket. Should the push fail because the bare repo is inconsistent with your local working (non-bare copy), then the push should fail and the hook will not be executed (unless the wrong hook was used here). – b01 Jan 08 '23 at 22:46
  • Until a scenario/example is provided where this solution fails I see no reason to call this a bad solution. – b01 Jan 08 '23 at 22:48
  • 1
    You are still missing the basic point that aws s3 sync does NOT copy every file every time and does NOT use hashes to determine if files have changed. Its decision on whether files have changed is imperfect. This leads to some ugly cases where S3 syc will not upload a file which was in truth modified. See here: https://stackoverflow.com/a/43531938/453851 What I've tried to tell you is that this can cause corruption. If you don't believe me that's just fine. I've put my warning in a comment and downvoted accordingly. – Philip Couling Jan 09 '23 at 01:27
  • So your claim is that in some unknown case the AWS CLI will fail to sync a file. OK. But it could also fail to upload a file due to network issue or a handful of other reasons. You fail to see that giving some vague warning with no way to avoid it is just a waste of time. – b01 Jan 10 '23 at 03:17
  • It's not "unknown" or "vague". AWS s3 sync uses file sizes and timestamps (see the link above) timestamps can only be compared greater / not greater, it doesn't even use an exact match. git modifies some files without changing the size. So it's pretty easy for your solution to trip up on timestamps alone! **I don't offer any way to avoid it... it's unavoidable with AWS s3 sync** ergo it's a bad solution! – Philip Couling Jan 11 '23 at 02:10
  • I'm trying to understand how to set this up and I'm a little confused. The bare repo copy is local to your machine, is that correct? But you don't seem to be saying to clone the repo, or am I supposed to make a bare copy from the one I already have locally? And how do you specify which files you want to be synced vs. excluded? – szeitlin Aug 22 '23 at 21:18
3

You can use mc aka Minio client, its written in Golang & available under Open Source Apache License. It is available for Mac, Linux, Windows, FreeBsd. You can use mc mirror command to achieve your requirement.

mc GNU/Linux Download

64-bit Intel from https://dl.minio.io/client/mc/release/linux-amd64/mc
32-bit Intel from https://dl.minio.io/client/mc/release/linux-386/mc
32-bit ARM from https://dl.minio.io/client/mc/release/linux-arm/mc
$ chmod +x mc
$ ./mc --help

Configuring mc for Amazon S3

$ mc config host add mys3 https://s3.amazonaws.com BKIKJAA5BMMU2RHO6IBB V7f1CwQqAcwo80UEIJEjc5gVQUSSx5ohQ9GSrr12
  • Replace with your access/secret key
  • By default mc uses signature version 4 of amazon S3.
  • mys3 is Amazon S3 alias for minio client

Mirror your github local repository/directory say name mygithub to amazon S3 bucket name mygithubbkp

$ ./mc mirror mygithub mys3/mygithubbkp

Hope it helps Disclaimer : I work for Minio

koolhead17
  • 1,944
  • 1
  • 12
  • 20
1

You can use deplybot(http://deploybot.com/) service which is free of cost for single git repository.

You can automate the deployment by choosing "automatic" in the deployment mode section.

I am using it now. It is very easy and useful.

Jayaprakash
  • 1,407
  • 1
  • 9
  • 19
0

Perhaps use s3 sync from the awscli.

If you want to ignore the same local files as you do when you push to a remote repository, you'll want to use the --exclude flag. This was encouraged by some of the AWS internal training, and it works, but it includes everything in your folder, including pycache/any files you want to ignore unless you list them as optional arguments with that exclude flag. If you prefer this method, you can write a script with a .sh extension and have a 'uge series of --exclude flags with all files/directories you want to ignore.

aws s3 sync ./* s3://fraschp/mizzle/ 
--exclude ".git/*" 
--exclude "./pycache/*"
--exclude "*.csv"

More information about the syntax or rationale, especially about include/exclude, is available in the docs.

I like this vanilla approach because I don't have to install anything and it complies with any security considerations baked into s3 tooling.

  • 1
    Just a note that aws s3 sync compares time stamps and sizes. This may have trouble knowing which files to upload and therefore fail to upload some files. – Philip Couling Jan 11 '23 at 02:23
0

This can be done with IDrive e2, idrive.com; free tier is 10GB in 2023.

Make some mount like ~/s3-bucket, then create a repo in it:

git clone --bare . ~/s3-bucket/my_git_repo.git

Then you can git push; and once setting S3 bucket mount on another host, git pull your files.

Instructions for setting a mount in S3 bucket can be found in:

  1. "A Guide on How to Mount Amazon S3 as a Drive for Cloud File Sharing" by NAVIKO Team. Very useful info, even if you are using the second reference from below.
  2. Specifically IDrive instructions: "How do I use S3FS with IDrive"

This won't work in Google Colab free teer, sudo privileges are needed for S3FS to be setup; but, if you can pay for S3 bucket to be set as public - you can access it vi HTTP w/o setting up S3FS.

Watch your micropayments.

  • I removed what seems like an afterhought question you are asking - which should not be part of an answer. Feel free to [edit] in case I misunderstood your meaning, ideally according to [answer]. – Yunnosch Aug 25 '23 at 21:33
-1

version control your files with Github? This script (and its associated GitHub / AWS configurations) will take new commits to your repo and sync them into your S3 bucket.

https://github.com/nytlabs/github-s3-deploy

ChatGPT
  • 5,334
  • 12
  • 50
  • 69
-1

You need JGit for it.

Just save a .jgit file in User directory with aws credentials and you can use git with s3.

Here is what your git url will look like.

amazon-s3://.jgit@mybucket/myproject.git

You can do everything you do with git with jgit.

Get a complete setup guide here.

https://metamug.com/article/jgit-host-git-repository-on-s3.html

Community
  • 1
  • 1
Sorter
  • 9,704
  • 6
  • 64
  • 74