Summary: What are the best practices for handling long running tracking of upstream repositories where you want to maintain a set of local changes?
I want to keep a fork on github up-to-date with the upstream but still allow clear tracking of changes unique to the fork. (for this discussion, assume that upstream
points to the main project repository and that origin
refers to my fork of the repository)
Imagine I have something like this where I forked a repository when upstream/master was at E.
Upstream:
A-B-C-D-E-F
Fork:
A-B-C-D-E ----- P ------T
\-L-M-/ \-Q-R-/
After forking the respository I created two feature branches (L-M and Q-R) to add new features I needed and merged them back to my origin/master. So now my branch has improvements that don't exist upstream.
I find that upstream has a couple of interesting fixes so I want to get back into sync with upstream. Based upon most references I have found (git hub fork), the recommended way to do this is to merge upstream/master into your origin/master and continue on your way. So I would issue commands like:
git checkout master
git fetch upstream
git git merge upstream/master
git push
Then I would end up with repositories that looks like this:
Upstream:
A-B-C-D-E-F
Fork:
A-B-C-D-E ----- P ------T-F'
\-L-M-/ \-Q-R-/
There are a couple of problems I see with this though.
I don't actually have commit F in my repo, I have F' which has the same content, but a different hash. So I can't easily reference commits between the two repositories and know that I have a change. (it gets even more complex when considering that upstream probably has more than one change and has it's own set of feature branches that have been merged)
As I move forward and continue doing this it becomes increasingly difficult for me to know what changes I have in my repository beyond what is in the upstream. For example I may submit some of these changes back upstream while continuing to add my own refinements. After several iterations of this, how does anyone looking at my repository know how it differs from upstream? (is there a git command to find these changes?)
Similar to #2, how would someone find a fix in upstream and check to see if my fork contains the fix?
I guess the root of the problem is there is no way for me to guarantee that my repository is in "sync" with the upstream at any given point because the code and the hashes are not the same. So how do I go about tracking the changes accurately and keep myself from going insane trying to keep things in sync?
Note: I had considered using rebase to keep rebasing my repository off upstream, but this has an entirely different set of issues. For example if anyone references my respository through submodules, branches, etc then the history rewrite will break their references. Additionally, I don't think my branch history would survive the rebase so I would not have a complete view of all the feature branches I had made and the associated history.
How do other people handle this? What are some best practices I should be looking into?
Update:
Based upon feedback from Seth, I created a set of test repositories to show what I was talking about and how it works out the way he says.
The repositories are:
They should show more clearly how merging from upstream looks when there are local changes as well.