Merging changes from an "embedded" upstream project

Question

I have a git repository that contains several files that come from an upstream source. I have a few local modifications to these files, but they are largely the same as the upstream versions, and I would like to be able to stay in sync with upstream releases. I don't need upstream history, but if there is a new release, I'd like to be able to merge that in while still keeping my own changes. As a result, it's not as simple as just copying the upstream files into my repository, because that will result in my changes being lost, and it's a real pain to manually run vimdiff or something similar to ensure my changes get added.

Right now I've come to a solution that looks like this:

Create an orphan branch that is completely empty (call this upstream)
Add the upstream files to this branch (so they're the only thing in it)
Merge upstream into my main branch, passing --allow-unrelated-histories
Apply my changes to the files and commit

Now I should be able to bring in changes to upstream and continue merging that, while keeping my changes intact. It seems to work but feels hacky. Is there a more appropriate solution to this problem?

Edit:

Here's a scenario that mimics what I'm doing: there is a header-only C++ library available for download somewhere. It's not in a Git repository, it's just a bare file somewhere that's periodically updated. I'm using that file, but I have some local changes to it. I want to be able to track changes in future downloads while still keeping my changes to the file (with conflict resolution when necessary). I want the file part of my repository, so I don't want to have downloading and patching be a part of the build process. I'd prefer to use Git to do the merging/conflict resolution.

*I don't need upstream history, but if there is a new release, I'd like to be able to merge that in while still keeping my own changes.* For that, you need upstream history. :-) Maybe not the exact same history, but *something*. Might as well just use what they have. It takes more space, sure, but it will be a lot easier, and disk space is cheap. (WD has 20 TB drives coming out I think, and 14 TB Red drives have been out for a while. The optimal price point is probably a bit lower than that: I see WD Red 12 TB drives for $300. — torek, Nov 09 '21 at 07:01
I keep thinking I'll build a new NAS box ... put in 4 of those for $1200, in a ZFS raidz1 configuration, and it's $1200 for about 36 TB usable, or about $30/GB. I remember when a 10 *megabyte* hard drive was hundreds of dollars. As things stand though I haven't even started approaching 1 TB in needed storage, even with a lot of photos. — torek, Nov 09 '21 at 07:08

TTT · Accepted Answer · 2021-11-08T18:02:39.720

Perhaps it feels hacky because of having a persistent unrelated branch in your repo. Although it's (in my experience) abnormal, it is a pretty good representation of what you're trying to achieve (having a third-party relation without depending on a separate repo). Given that, I don't see an issue with your proposed solution. Some notes though:

You should only need to use --allow-unrelated-histories the first time, since from then on they will be related.
You shouldn't have to "apply your changes" each time you merge. Instead it would be "resolve conflicts" if there are any. The initial merge may be the most complex, but after that it should be simpler, perhaps even automatic.

The assumption is that every time you wish to update the third party file(s), switch to the upstream branch, drop in the files, commit with a message describing the version and/or date of the files, and then switch back to your working branch and merge in upstream and resolve any conflicts. That's pretty clean, IMHO.

Optional Tweak: in larger repos, switching to an empty branch could be time consuming as it has to delete all the files in your repo, and then write them all again when you switch back to your working branch. Another alternative is, instead of an orphan empty branch in the same repo, is to put that branch in a separate mini repo locally which mimics the directory structure of the files in question. In that case you could have just a single branch called "main" that mimics the upstream branch in your proposed solution. Then in your main repo you can setup up a secondary remote (perhaps called "upstream") to that local repo, fetch from it, and then merge in upstream/main into your working branch. This may solve the hackiness problem as well, but it does violate your constraint of depending on another repo. At least that repo in this case is your own though.

"Apply my changes" was just part of the initial setup; agreed that it's automatic (barring conflict) after that first one. That's the impetus behind my desire to use Git for this, and though it seemed to be effective, I was a bit worried there would be problems. Thanks for your input. I'd been racking my brains for some way to make this as painless as possible and it's the best I could come up with. I don't see any reason to avoid it now. — Chris, Nov 08 '21 at 18:13
BTW, if you decide to go with the orphan empty branch in the same repo, there is a way to commit your new file changes to that branch without checking it out. See [this question](https://stackoverflow.com/q/7933044/184546) for details. Note, IMHO the selected answer to that question is incorrect, and you should look at [CB Bailey's slick answer](https://stackoverflow.com/a/7941509/184546) instead. Also, [Torek's answer](https://stackoverflow.com/a/41113460/184546) describing 2 worktree's is perhaps the best approach if your repo isn't ridiculously big. — TTT, Nov 08 '21 at 18:38

Merging changes from an "embedded" upstream project

1 Answers1