2

Assume there is a big Git repo Orig under an active development and with a long history. That repository has some big files too but does not use the git-lfs. I don't control that repo, particularly I can't migrate it to the git-lfs.

I'd like to create a fork repository (Fork) with the following properties:

  1. I can easily merge changes from the Orig development into the corresponding Fork branches and in my custom branches based on those branches (cherry-picking each change doesn't count as "easily").
  2. I can easily create pull requests to the Orig repo with some of my changes in the Fork (less important than #1 but nice to have)
  3. I want to have the history of the Fork much reduced in terms of the depth (smaller clone)
  4. I want to have the history of the Fork much reduced in terms of big files because I want to host my Fork on the GitHub. This is an issue because git-lfs migrate rewrites the history.

Accidentally just solving #3 will probably fix #4 because there are just a few really big files that would require git-lfs and they all are not in the part of the history I really want to keep in my Fork.

Achieving just #4 without #3 (but still getting #1 and #2) is acceptable because #4 is really a blocker while #3 is only nice-to-have.

Things I don't really need:

  • I don't need to keep all the Orig historical branches in my Fork. I really need the master. A few recent feature branches as well would be really nice.
  • It is acceptable for the initial setup to be complicated and time-consuming as long as the main workflow after that works smoothly.

Some obvious things that I believe do not work:

  • Do git-lfs migrate to fix #4 and push it to the GitHub. This seems to break #1 and #2 since it creates a totally new history with no connection to the original one so I can not easily merge new changes from the Orig to Fork.
  • Create a shallow copy and push it as a new repository to the GitHub. This fixes #4 and probably #3 but again breaks #1 and #2.

I suspect there is no good solution to this problem with how git hashes work. At least Partial repository mirror of git repositories and Git mirror of non-LFS repo to LFS-repo do not look really promising. But I still have a hope it can be done with some git-magic (or that somehow the things have changed over the last few years).

SergGr
  • 23,570
  • 2
  • 30
  • 51
  • I'm not sure how well this would work, but maybe you could stitch two semi-related repositories using replace refs? As I understand `git filter-repo` should be able to create replace refs, and I found [this](https://github.com/newren/git-filter-repo/issues/7) where someone provides a script for migrating to lfs that you might be able to use as a starting point (might need to also add `--replace-refs update-and-add`). – Hasturkun Dec 14 '21 at 12:29
  • @Hasturkun just migrating the `Fork` repo to LFS is trivial. The question is that the `Orig` repo is not going to be migrated to the LFS but I still want to be able to cross merge (future) changes between the `Orig` and the `Fork` (in a mostly automatic way). I don't see how your plan solves this issue. – SergGr Dec 14 '21 at 15:12
  • The main point of my suggestion was setting up of replace refs, which `filter-repo` could do for you while importing into LFS (making it possible to refer to the new commits by their old IDs). I was thinking of something similar to the example given [here](https://git-scm.com/book/en/v2/Git-Tools-Replace), but reading further, I'm not sure it will help you since I'm not sure that GitHub PR will actually respect these (also, it looks like `filter-repo` will only create replace refs for commits, not other objects). – Hasturkun Dec 14 '21 at 16:25
  • Does your #3 means say `Orig` has commit `a-b-c-d-e-f-g` and `Fork` only has commit `e'-f'-g'`? AFAIK, if you make `e'` the root commit then it can by no means be the same with original `e` since they have the different parents, therefore the same for `f'` and `g'`; therefore you cannot cross-merge `Orig` and `Fork`. – lzhh Dec 15 '21 at 02:25
  • @lxvs technically I do not have such a requirement although if it was possible, it would solve my problem. What I imagine I need is that locally I can have a git repo that has 2 remote origins: `Orig` and `Fork`. And I have `orig_master` branch that I can synchronize with the `Orig`'s `matser`, `fork_master` branch that I can synchronize with the `Fork`'s `matser` **and** I somehow can merge changes from the `remote_master` into the `fork_master`. How this happens "inside" is irrelevant. @Hasturkun's replace idea actually looks somewhat promising but I still haven't tried it in practice. – SergGr Dec 16 '21 at 00:07

0 Answers0