0

I have a large (several GB) Git repository. I want some application to create small files within that repository and commit the changes. This should happen without checking out those gigabytes to the disk.

I found a JGit code sample in which a remote repository is cloned into an in-memory respository.

Can I use JGit (something like shown below) in order to add a file to a remote repository without checking it out locally (i. e. without transferring gigabytes of data to the machine where that code will run)?

DfsRepositoryDescription repoDesc = new DfsRepositoryDescription();
InMemoryRepository repo = new InMemoryRepository(repoDesc);
Git git = new Git(repo);
git.remoteAdd();
git.commit();

Update 1: The size of the entire directory (tracked files plus .git) is 1.2G. The size of .git alone is 573M.

% du -hs .
1.2G    .
% du -hs .git
573M    .git
Glory to Russia
  • 17,289
  • 56
  • 182
  • 325
  • 1
    Is it ok to fetch and checkout only the last commit of this repo ? or does this commit alone take gigabytes ? – LeGEC Oct 10 '20 at 21:56
  • @LeGEC Checking out the last commit only should be fine. Most of the commits in this repository are only a couple of text lines. – Glory to Russia Oct 11 '20 at 09:02
  • 1
    I guess you are talking about the *diff* introduced by the last commit. My question was : is your repo big because it has many commits, over a set of files that would be a couple of 10 Mb large when checked out ? or does the checkout of one single commit already several Gb on disk ? – LeGEC Oct 11 '20 at 12:44
  • @LeGEC How (using what git commands) can I figure out the size of one single commit? I've never checked out single commits so far. – Glory to Russia Oct 12 '20 at 03:52
  • Size of your project directory minus size of its `.git/` subdir – LeGEC Oct 12 '20 at 05:27
  • @LeGEC This is roughly 500MB (see *Update 1* for details). – Glory to Russia Oct 12 '20 at 13:36

1 Answers1

1

Note that a checkout in RAM will download the exact same data from the remote. It will just use 1,2G in RAM, rather than on disk, after the clone is completed.

You may also want to see how you can create and add a "file" using JGit api, to see if it is a convenient way to add the data you want to your repo.


If you want to limit the amount of stuff that's downloaded from your git server : you can create a shallow clone.
see this SO answer for example :

git clone --depth 1 <repo_url> -b <branch_name>

This will download the inner files for the latest commit (a compressed version of the 500M in your case), and run an actual checkout (the 500M) of this commit.

You can run this either on disk, or in RAM, depending on your needs.

In your situation, this would still require something in the 600-700M usage, in RAM or on disk, including the checked out files,
but it would download only ~100M (I'm guessing : check the actual size by running the shallow clone command from your workstation) from the server for the compressed version of the head commit.

LeGEC
  • 46,477
  • 5
  • 57
  • 104