4

I'm trying to checkout a particular file from particular commits in remote.

Please note that the commit is not in local repo and is only part of remote repo.

  1. I do not want to download the raw file from GitHub/bitbucket interface. Because my remote is not on similar platforms.

  2. I do not want to do git fetch followed by git checkout because doing git fetch will download a bunch of other items which I don't want. I'm only interested in that particular file from the particular commit.

jthill
  • 55,082
  • 5
  • 77
  • 137
itthrill
  • 1,241
  • 2
  • 17
  • 36
  • 1
    Since you used the word "checkout", everyone thinks you want to use proper git processes on this single file. Please try to use the right words since what you *really* want to do is just to download the file from the remote repository. – Lasse V. Karlsen Dec 18 '20 at 07:50
  • Try [`git archive`](https://stackoverflow.com/a/32731709/7976758) or [`git clone --depth=1 --filter`](https://stackoverflow.com/a/52269934/7976758). – phd Dec 18 '20 at 11:07

5 Answers5

2

edit: from comments:

I need to check same file from hundreds of commits spread over dozen of branches.

For this, you're going to need cooperation from the other repo's admin.

In Git, history is published by giving it a refname (branch, tag, whatever) and some sort of access via shared filesystem or hosted server.

The stuff that's not worth giving its own refname is either part of published history (that does have its own refname) or it's not.

If it is, Git will ensure you get a complete, internally-consistent pack that brings you up to date with the published history you asked for. Git's laser-focused on making that specific operation as fast and efficient as possible.

If it's not, then the hosting repo hasn't published it and (a) you ordinarily can't get it at all, and (b) you ordinarily don't even know how to ask for it, its object id.

To find an object's id, you have to hunt through history examining snapshots, ... which means you have to have the snapshots ... see?

Git doesn't like paying overhead costs twice, and it's built to be a vcs. You're trying to use it like a shared filesystem. Filesystems are built to be efficient at serving single objects frequently and repeatedly to the same client. dvcs's are built to be efficient at serving multiple complete revisions, at relatively quite long intervals, once per client. This is engineering-tradeoff territory: you can't be superbly efficient at both, and the better you get at one or the other, the harder it is to re-tool and do the other thing.

All that said: if you can get the other repo admin to do some custom work for you, this won't be hard:

git rev-list --branches --objects -- path/to/file | git pack-objects pack

will pack up the history of all branches' versions of that file: the commits that introduce new versions, the trees that show where they go, and their contents, and put it in two files named pack-<hashcode>.{idx,pack}. Put that pack in any repo's objects/pack directory and there you are: you've got everything you need to deal with just that file.

Such a sliced-up history is relatively difficult to work with, and the overhead of filling in the missing bits on demand is precisely what Git's built to avoid, but work with exactly what you've got, you can use e.g. git verify-pack -v to show you the exact contents of a pack and git cat-file -p to print individual objects. The commits in that pack are the ones that introduce new versions, you refer to your file in one of those by appending :path/to/file to its commit id.

So, when you run the verify-pack to see what you've got, you'll get a dump of waaayyyy too much information about its content and structure. To make it useful for your purposes here, you can scrape just the commit ids out, and list those by date order, with

# this is the pack I made for testing 
git verify-pack -v .git/objects/pack/pack-8d3bb7bca6a4cdc086778ad55c79f45e672ae7e5.idx \
| awk '$2=="commit"{print $1}' \
| git rev-list --stdin --date-order --no-walk

sub in log for rev-list to see the log messages, or you can show the blob you fetched with e.g. git show <commit-hash>:path/to/file. To show the blobs in time sequence you can

git     git verify-pack -v .git/objects/pack/pack-8d3bb7bca6a4cdc086778ad55c79f45e672ae7e5.idx \
| awk '$2=="commit"{print $1}' \
| git rev-list --stdin --date-order --no-walk --pretty=%h:path/to/file \
| git cat-file --batch

which will dump the content in scannable form.

. . . actually, if an all-in-one dump of the history will do ya, and you just need the content and sequence to match, not so much the resulting commit id's, Git's fast-export might do the job all in one for you, have the admin do

git fast-export --branches -- path/to/file | zstd >my-stuff.zst

which might even be more compact than the pack files (since it doesn't have to preserve id's) and ship that to you.

jthill
  • 55,082
  • 5
  • 77
  • 137
  • this is interesting; I managed to get .{idx,pack} for the file. I moved .{idx,pack} to objects/pack. I am not clear what is the next thing you are suggesting. I mean how do i leverage on this pack info. – itthrill Dec 18 '20 at 17:01
  • git verify-pack -v pack-ffd199abe9c8db57dff06404dbad16a91458dd11.pack c4e6eabc55304837476d5d4cb02aa79a4ed9082b commit 259 179 12 222fe027d32ac6357000d9c541506a65d43adb06 commit 259 180 191 non delta: 2 objects pack-ffd199abe9c8db57dff06404dbad16a91458dd11.pack: ok – itthrill Dec 18 '20 at 17:04
  • 1
    Okay, I confess I left that part out because I was dubious about getting any interest,I'l add it in now. – jthill Dec 18 '20 at 17:05
  • . . . wait, was that a full dump of your verify-pack output? two objects? I don't see how to get that kind of result from the command I gave, . ... gaak.. I left an arg out of the command I gave, *sorry*. add `--objects` in to the rev-list feeding pack-objects, editing that in now. – jthill Dec 18 '20 at 17:55
2

I do not want to do git fetch followed by git checkout because doing git fetch will download a bunch of other items which I don't want.

You need to do git fetch. That is the way you get the remote server to send you stuff. You can, however, minimize the amount of "extra stuff" that it sends you, using something like

git fetch --force --depth 1 origin $COMMIT_SHA:tmp

which will fetch just the commit $COMMIT_SHA (and all of the files needed to complete it — AFAIK you can't avoid that) from the remote origin, and name it tmp. The --force will prevent failure if a branch named tmp already exists (good for repeated use, but use with care, of course).

Then you can git cat-file blob tmp:somepath or git checkout tmp -- somepath or whatever you want to access the file contents.

If you git branch -D tmp ; git gc when you're done, there should be virtually no accumulated cruft.

hobbs
  • 223,387
  • 19
  • 210
  • 288
2

It's possible but may not work if the server side configuration is not under your control.

Git has a built-in command which can retrieve a file in a specific commit.

git archive --remote=<url_to_the_repo> <commit> --format=tar <path> | tar xvf -

Some hosting services don't allow git archive --remote. If so, this command cannot work at all.

Some hosting services disable fetching unadvertised objects. If so, for the <commit>, only a valid ref instead of a sha1 value is allowed. With a ref, one can only get the file of the tip commit. For a commit not referenced to by any ref, it's not possible to retrieve it this way.

Another possible but unpractical method is to create tags for all blob objects of your interested files, and then fetch the tags to retrieve their contents.

One of the more practical methods is to fetch all the necessary data, and use git show <commit>:<path> to read the content. It takes time and disk space, but it's very reliable. And avoid git checkout if possible, to save a bit time and space.

ElpieKay
  • 27,194
  • 6
  • 32
  • 53
1

If you happen to know a particular remote branch which contains this commit, you may fetch this branch alone:

git fetch origin some_branch

Then, checkout the file at the exact commit you want:

git checkout abc123 -- path/to/your/file.ext
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • No I don't want to fetch whole branch. I have commits from several branches and I want to rely only on commit. – itthrill Dec 18 '20 at 04:24
  • 2
    ...you can always drop the branch after you are done. – Tim Biegeleisen Dec 18 '20 at 04:25
  • :) sorry but I need to check same file from hundreds of commits spread over dozen of branches. In my case it is not practical to maintain list of both commit ids and branch names. also same commit could be part of several branches. I need to rely only on commit id. – itthrill Dec 18 '20 at 04:27
1

Unfortunately, you cannot do what you describe in your question and your comments.

git doesn't work at the file level like subversion or other source code management systems. git works at the snapshot level. Every commit is like a snapshot of your code (this is a very simple model of how git works, it's more complicated under the hood). Therefore, the only way you have to get the files that you want is

  • first get the snapshots from the server to you local machine (git-fetch)
  • second, once you have the snapshots, you can extract files from the snapshots (git-checkout).

The answer from @hobbs to this same question shows you how to do it.

aalbagarcia
  • 1,019
  • 7
  • 20
  • 1
    "Also, you cannot download a single commit using the git-fetch command." of course you can. That's not the only thing wrong in this answer. – jthill Dec 18 '20 at 07:22
  • Well, I don't think that is wrong: https://stackoverflow.com/questions/3697707/how-do-i-download-a-specific-git-commit-from-a-repository I've updated the answer to include the link, If you could point me the other things that are wrong, I'll be more than happy to correct them. – aalbagarcia Dec 18 '20 at 07:31
  • You can fetch a single commit with `--depth=1`. You can fetch a single *arbitrary* commit if the remote has that configured. You can fetch a single branch with e.g. `git fetch origin master:refs/remotes/origin/master`, you don't need to edit a file (and then edit it again when you're done). – jthill Dec 18 '20 at 07:45
  • It sounds like OP doesn't even want to download the commit, just the single file, from various commits. Git is not built to easily (or at all) do this. – Lasse V. Karlsen Dec 18 '20 at 07:47
  • @jthill Thanks a lot for your comments. I'm looking into your answer and what you mentioned about having a remote properly configured. I would say that with the command git-fetch you can fetch a single arbitrary commit using --depth=1 (again, thanks for that) – aalbagarcia Dec 18 '20 at 08:23