Git merging in intelligent way

Question

I'm new here and just wondering around in git.

Recently, I got a chance to work on a huge repository. (I believe it can be ranked in global git history as well to be that huge) The repo is having average of 3-5 commits per minute.(no kidding here)

lets get back to problem,

As repo is huge we maintain a PR based merging meaning, no one can directly commit on pre-defined 5 branches. when we try to commit on the develop branch we need to create a custom feature branch and it is regularly out of sync so to avoid that while developing we are pushing the empty branch first. resulting in following scenario on local.

Mainline branch 

a--b--c--d--e--f--g--h      --i--j
                       -1--2 (custom/feature branch)

so in given case I want my branch (custom/feature branch) to be like

a--b--c--d--e--f--g--h--k--l--1--2

I know a way to achieve this where we are doing following operations on custom branch

git reset --hard~2
git pull origin Mainline --ff-only
git reflog | grep "commit"
git cherry-pick ######1
git cherry-pick ######2

given method ensures that I haven't lost any of my commits and my branch is synced with the Mainline branch without any merge commits.

So the question here is "It's been hectic to do this for all the commits/branches, so is there any way we can have it done through 1-2 commands?"

I was also confused about the behaviour of the git, as we are using bitbucket web-interface it shows us whenever conflict occurs in PR section while merging the custom branch to mainline branch.

So the doubt here is that, is it necessary to keep the custom branch updated? (as we are using recursive strategy for merging I don't feel it is absolutely necessary.)

Masklinn · Accepted Answer · 2021-04-05T15:52:04.537

3

So the question here is "It's been hectic to do this for all the commits/branches, so is there any way we can have it done through 1-2 commands?"

$ git fetch origin Mainline
$ git rebase origin/Mainline

Beware that this will not update the “local” Mainline, but if you are not supposed to work in it directly you might as well delete it and only keep the remote.

If you have multiple useful branches you may want to leave the branch name out and add -p (--prune) to fetch e.g.

$ git fetch -p origin

This will update all branches tracking origin, and will automatically delete (prune) the ones which have been deleted.

If you have multiple remotes, you may want to replace origin with --all, keeping the -p, in the same command: --all will go through every remote and update all their branches.

git pull --rebase origin Mainline should also work, but I'm not fond of git pull, I'd rather perform the "synchronise with remote" and "update local branches" steps separately and explicitly.

I'm a bit surprised that this would not be provided as an example workflow by the contributor documentation or training.

So the doubt here is that, is it necessary to keep the custom branch updated? (as we are using recursive strategy for merging I don't feel it is absolutely necessary.)

Technically? No, git doesn't care.

If this is mandated by the project's contribution guidelines though there may be practical reasons for it e.g. history visualisation tools tend to have trouble with many branches overlapping one another, this is a way to avoid branches overlapping as a branch basically always spans two commits.

If I mandated this, I would add tooling to do it automatically though (some variant or mode of a "merge bot").

edited Apr 05 '21 at 15:52

answered Apr 05 '21 at 06:35

Masklinn

34,759
3
38
57

1

The concept of rebasing before merging with a merge commit at PR completion is called "semi-linear merge". Azure Devops has it, GitLab almost has it, but GitHub doesn't have it yet though the request has been on the back-burner for years. – TTT Apr 05 '21 at 14:17
Nice answer. It might help to explain the "-p" paragraph a little more. I don't know that it's clear to most readers why that is helpful, and also which command you would add it to (though obviously only `fetch` has the option). – TTT Apr 05 '21 at 14:19
1

@TTT good point, added a paragraph to explain `-p`, as well as the behaviour of the various variants of fetches. – Masklinn Apr 05 '21 at 15:52
@TTT "semi-linear merge" seems to be the MS/Azure term for it, I'd never heard of that term before. We just call it "rebase and merge" (though that conflicts with the likes of github, for which it means "rebase then merge --ff"). – Masklinn Apr 05 '21 at 15:56
I think the term is starting to become more widespread. GitLab uses [similar terminology](https://docs.gitlab.com/ee/user/project/merge_requests/reviewing_and_managing_merge_requests.html#semi-linear-history-merge-requests) but it's just a config to enforce it rather than automate it. Without it being a single click button and 3-5 commits going in per minute, it might be hard to ever complete a PR with that setting enabled! Based on this discussion it looks like [GitHub has a similar setting](https://github.com/isaacs/github/issues/1017) as GitLab. – TTT Apr 05 '21 at 16:03
@Masklinn One more question actually that I missed completely (sorry for it!) I would like to know more about merging strategies like squash and recursive. currently, we are following only recursive strategy but as already mentioned I'm working on a huge repo, so the number of commits is huge. will it be better if we start to use the squash merge? **doubts** - does it creates the dangling commits ? - if its creates dangling commits then, can it be removed by auto gc? - how much increase in size it will cause to use squash merge (let's say I have to merge 200-250 commits per hour) – Prasad Kasar Apr 12 '21 at 07:23
"Squash" merges may make the original commits dangling (though no more so than a rebase), the dangling commits will indeed be GC'd. Squash merges won't really increase the repository size in absolute terms: "merge commit" will have the same tree (snapshot), the "squash" commit will just have one less parent. It could go either way with respect to packfiles: the intermediate commits may help produce better delta chains, or the increased number of commits may hamper their discovery. – Masklinn Apr 12 '21 at 07:28
Also note that squashing is not an actual merge strategy. Merge strategies are about trying to find how to, well, merge trees. Squashing basically just denotes whether the final merge commit records the merge information or not. – Masklinn Apr 12 '21 at 07:30
BTW it's just for your info -we are using bitbucket for the repository and our random and higher number of active users have already caused to slow down the things for the repository -we have already reached out to the support team and they are asking us to reduce the number of active PRs/Branches. which is quite not possible for us as we are having around 900 people working on a same repo so activity log is huge -we dont use any commit visualization tool we only prefer bitbucket UI and local git for visualization(let say we are new in git|we want to stick to more traditional methods) – Prasad Kasar Apr 12 '21 at 07:30
@PrasadKasar FWIW we have a somewhat similar setup at $dayjob, our solution is that the development branches mostly live into a separate development repository, from which we create PRs to the main repository. That leads to a somewhat lower churn on the "master" repository, and less garbage there (as development / experimentation don't hit the main repository, only the dev one). We're not on bitbucket though so YMMV. – Masklinn Apr 12 '21 at 07:33
@Masklinn Yes, I totally agree with you and I would like to follow your strategy as well but seems it is not possible to have PRs which are from one repo to another in bitbucket. can you tell me how it can be achieved while maintaining the commit history? (It will be really appreciated if you do :D ) – Prasad Kasar Apr 12 '21 at 07:40
As I said we don't use bitbucket (we use github), however I don't see how cross-repository PRs would not be possible, that's the standard workflow of every OSS project: contributors create branches in their personal forks and create PRs against the main repository. You should just need to ensure the "dev" repository is a "proper fork" of the main one. This may require a separate organisation entirely though (it does on github) forking a repository back into the same account might understandably not be a supported use-case. – Masklinn Apr 12 '21 at 07:52
Okay that helps a lot thank you so much for the help. I will check with the team what fork settings we have and try to dig into it might solve the root issue. BTW we are not working on OSS Project its a complete closed source one. – Prasad Kasar Apr 12 '21 at 10:04
@PrasadKasar yes I just used the normal OSS project workflow to show that creating PRs from forks is necessarily supported as it's a very common requirement, the $dayjob repository I'm talking about is not OSS either. – Masklinn Apr 12 '21 at 10:21
@Masklinn how do you handle conflicts in such cases? suppose we are using main repository A and I'm working on a repository A` which is a fork of the A and while merging the A` back to A we are getting conflicts, so how do we handle this ? also if there is a huge gap in the timeline of A and A` then how do we deal with the syncing? – Prasad Kasar Apr 13 '21 at 04:18
@PrasadKasar the same way we otherwise do. And we don't use the reference branches from the fork (they've been rewritten and protected against pushes), those are taken from the main repository. Git deals rather well with multiple upstream repositories ("remotes"). – Masklinn Apr 13 '21 at 05:41
Basically you create your development branches from the main repository's references, but you push them into the dev repository. To resync them, simply update the main remote, rebase the branch on top its reference (`git rebase main/refernce-branch`), and push the update to the dev repo. – Masklinn Apr 13 '21 at 05:43
@Masklinn is there any way without doing it manually? as we have too much frequent commits we need some automated way to do it, again we can add the manual work in script but still there is a slight chance that we miss the remote repo changes (forked) while doing the pull-merge-push operation (happened to us already with main repo) – Prasad Kasar Apr 14 '21 at 04:45
I don't know what you're asking. It's a standard rebase on a reference branch, there's nothing special to it. – Masklinn Apr 14 '21 at 05:36
@Maskilinn if i'm the only user of forked repo then I can easily pull on local but if I'm having 200-300 users on forked repo then I will need to push the merge as well. so the problem here is when we take a pull for source and destination there is a chance that one of them gets updated till the time we do a merge-push operation even if we automate this using scripts still we face such issues sometimes as I have already mentioned it is due to high commit freq. The repo is having average of 3-5 commits per minute. so taking pull on forked repo and pushing the same to forked remote again is hard – Prasad Kasar Apr 15 '21 at 02:20

score 0 · Answer 2 · answered May 01 '23 at 08:13

Masklinn already explained that git-rebase(1) on upstream is what you want. TTT mentioned semi-linear merges.

Since this repository gets a lot of commits it might be hectic to follow a semi-linear merge strategy, especially if you demand that some CI pass before a merge. To alleviate that you can batch a proposed merge by creating an integration branch from the development branch, rebasing several feature branches on top of it, doing the CI run (if required), and then merging in the integration branch into the development branch. You can mention the features that you merged in or their pull requests in the merge commit message.

You could do the same thing but with merges into both the integration branch and then the final merge into the development branch, but that would create two levels of merges which might not be what you want if you are following the semi-linear merge strategy (I don’t know).

Git merging in intelligent way

2 Answers2