5

Goal: Add a new file to a remote git repository without checking the whole thing out locally.

Why: I'm building an app that will add files to a user's git repository. Some of these repositories will be hundreds of megs. Some will be touched very infrequently. I want to avoid taking up terabytes of disk space to keep large repos checked out that won't be touched often, and I don't want to incur the inevitable delay of checking out a 200Mb repo (lots of binary files) in order to add a new one new file to it and push that file back to the origin.

I'm assuming the default git client can NOT do this, but am hoping that someone has written something that can commit to a remote repo (don't care what language) without having the whole thing checked out locally. Does the Cloud9 IDE do something like this?

The app would have full access to the users git repo, either via SSH or whatever mechanism GitHub uses for oAuthed apps to tweak repos.

masukomi
  • 10,313
  • 10
  • 40
  • 49

5 Answers5

2

Depending on the structure of your repository, you may be able to use sparse checkouts to avoid downloading the large files:

http://schacon.github.com/git/git-read-tree.html#_sparse_checkout

(More information at Checkout subdirectories in Git?)

Community
  • 1
  • 1
cmbuckley
  • 40,217
  • 9
  • 77
  • 91
  • 1
    It's a good idea but it seems that sparse-checkout works by checking out the entire (compressed) repo locally, but only placing a portion of it into the worktree. Right? Unless I'm misunderstanding it. In which case I'd still have the problem of having to allocate storage space for the compressed repo which would still be hundreds of megs in many cases because of binary files (like jpegs) that already had decent compression on them before git gzipped them. – masukomi Dec 24 '11 at 18:18
  • If that's the case, maybe [submodules](http://book.git-scm.com/5_submodules.html) are the way to go. – cmbuckley Dec 25 '11 at 01:59
2

Git has several options for shallow cloning and filepath specific, partial cloning but these are not pushable.

The trick here is to utilize the --lightweight flag when you checkout - in which case, you will be able to push to a limited repository.

However, these solutions appear severely nonideal... Seems more intuitive that , If the app has programmatic access to a git repository, then you should be able to create, or require on installation the creation of a git project that is specific to your application's needs, which is empty.

jayunit100
  • 17,388
  • 22
  • 92
  • 167
  • 1
    shallow cloning doesn't help because in many cases these will be hundreds of megs even without history. I'm not quite following your idea about an empty repo, because the goal here is to add a useful file to a populated repo. – masukomi Dec 24 '11 at 18:23
  • I think you might be right here. Im now starting to realize how difficult your question is.. since git is designed to maintain the whole repo locally, which opposes your needs.. – jayunit100 Dec 24 '11 at 20:45
2

If you have shell access to the repos, you can create a fast-import file (see the manpage of git-fast-import for the details of the file format), and execute git fast-import inside the remote repository.

When you don't have shell access, the solution is much more hackier. You need to execute the following tasks to push your changes to a remote repo:

  • create the content of all files (=generate the hash of the new file, and find the hashes of the unchanged files [you get them from the current tree])
  • create the tree
  • create the commit
  • push the new file content, tree and commit to the remote repo
  • push the branch-move to the server

I would start with the hg-git extension, since I guess that there is some code which does something similar.

Rudi
  • 19,366
  • 3
  • 55
  • 77
  • hg-git was a good start, but it turns out Dulwich (git implemented in Python) is the library doing all the real git interaction there, and the (rather minimal) docs there led me to the ultimate solution. For anyone looking to interact directly with git repos from Python Dulwich seems to bee a potentially good solution. Kind of like Grit for Ruby, except with the remote support that Grit lacks. – masukomi Dec 27 '11 at 02:07
2

WARNING: THE FOLLOWING ANSWER NO LONGER WORKS WITH THE MOST RECENT VERSIONS OF GIT

(I'm open to suggestions as to how to make this work with current git versions.)

The answer, it turns out, is amazingly simple, and a testament to just how awesome Git is.

  • Create a brand new git repo.
  • add and commit the new files to it.
  • tell the new repo where the remote repo is (git remote add ... )
  • push to the remote repo.

Notes: The remote repo must either be a bare repo or have receive.denyCurrentBranch set to "ignore" or "warn"

This is based on the assumption that the files you are adding are NEW and will not conflict with any other file in the repo.

The existing contents of the remote repository do not matter so long as you can be sure you're not going to conflict with anything.

P.S. Thank you to everyone who posted the potential workarounds.

masukomi
  • 10,313
  • 10
  • 40
  • 49
  • I tried this with github but it didn't work. I got "Updates were rejected because the remote contains work that you do." Steps to reproduce: https://gist.github.com/chris-gunawardena/c88a0d11e0e59368ec45 – Chris Gunawardena Jul 03 '14 at 01:23
  • 2
    It appears that the situation has changed with more recent versions of git. I've noted this in the answer. Thank you for your example. I've created [an annotated example of your gist](https://gist.github.com/masukomi/4889766e1520ed3cbac4) that doesn't require github to test. I've tried a couple things but haven't come up with a working solution yet. Will update this ticket when/if I do. – masukomi Jul 03 '14 at 09:19
1

How about a 2nd lightweight repository with just the files you need to add, a script on their end could check out both repos and add your files. That script could also remove the consumed files from your new repo to keep it lightweight.

Dumben
  • 11
  • 1
  • that sounds like A workaround, but I'm trying to create something that doesn't require anything special on / around the repos the file(s) will be ultimately added to. – masukomi Dec 25 '11 at 17:39