2

I have a remote with a history that looks like this:

enter image description here

As you can see O and P are merge commits and both of them closed their old branch so now there's only one branch.

I want to squash C-D-E-G-J-K-L-N into one commit and F-H-I-M into an other commit because they are just tiny commits cluttering the history.

Locally I managed to squash C-D-E-G-J-K-L-N using the method described in the answer by John O'M. to this question, like this:

git checkout -b squashing-1 N
git reset --soft C~
git commit -m "Squashed history"
git replace N [ID_of_the_commit_i_just_made]

and this works, locally git log from main-branch correctly reports Q, P, O, X, M, I, etc. (X is the new squashed commit).

From here the next steps would be to (1) check out the main branch and merge in the changes, (2) delete the temporary local branch, then (3) push the changes to the remote repo. But (1) and (3) report Already up to date or Everything is up to date since there are no actual changes to tree which is exactly the point of all this.

I've tried using git push --force origin main-branch and git push --force-with-lease origin main-branch too but i got the same result: Everything is up to date.


How can I correctly merge in these history changes and push them to BitBucket without having to re-create the entire repo?

Community
  • 1
  • 1
ZanYzer
  • 45
  • 10

3 Answers3

11

You essentially have a choice to make: do you wish to make everyone use the replacement references, or do you prefer to rewrite the entire repository and make everyone have a big flag-day during which they switch from "old repository" to "new repository"? Neither method is particularly fun or profitable. :-)

How replacements work

What git replace does is to add a new object into the Git repository and give it a name in the refs/replace/ name-space. The name in this name-space is the hash ID of the object that the new object replaces. For instance, if you're replacing commit 1234567..., the name of the new object (whose ID is not 1234567...—for concreteness, let's say it's fedcba9... instead) is refs/replace/1234567....

The rest of Git, when looking for objects, checks first to see if there is a refs/replace/<hash-id> object. If so (and replacing is not disabled), the rest of Git then returns the object to which the refs/replace/ name points, instead of the actual object. So when some other part of Git reads some commit that says "my parent commit is 1234567...", that other part of Git goes to find 1234567..., sees that refs/replace/1234567... exists, and returns object fedcba9... instead. You then see the replacement.

If you do not have the reference refs/replace/1234567..., though, your Git never swaps in the replacement object. (This is true whether or not you have the replacement object. It's the reference itself that causes the replacement to occur. Having the reference guarantees that you have the object.)

Hence, for some other Git to execute this same replacement process, you must deliver the refs/replace/ reference to that other Git.

Transferring replacements from one Git to another

In general, you would push such objects with:

git push <repository> 'refs/replace/*:refs/replace/*'

(or specifically list the one replace reference you wish to push). To fetch these objects:

git fetch <repository> 'refs/replace/*:refs/replace/*'

(You can add this fetch refspec to the fetch configuration in each clone. Using git fetch or git fetch <repository> will then automatically pick up any new replacement objects pushed. Pushing is still a pain, and of course this step has to be repeated on each new clone.)

Note that neither refspec here sets the force flag. It's up to you whether you want to force-overwrite existing refs/replace/ references, should such a thing happen.

Rewriting a repository

Alternatively, once you have replacements in place, you can run a repository-copying operation—by this, I mean a commit-by-commit copy, not a fast copy like git clone --mirror—such as git filter-branch. If this copying operation is run without disabling replacements, the replaced objects are not copied; instead, their replacements are copied. Hence:

git filter-branch --tag-name-filter cat -- --all

has the side effect of "cementing replacements" forever in the copied repository. You may then discard all the original references and all the replacement references. (The easy way to do this is to clone the filtered repository.)

Of course, since this is a new and different repository, it is not compatible with the original repository or any of its clones. But it no longer requires careful coordination of the refs/replace/ name-space (since it no longer has any replacement objects!).

torek
  • 448,244
  • 59
  • 642
  • 775
  • I've not worked with `replace` much (in fact had to look it up to propose an answer here); do you know of specific drawbacks when using replacement refs? (I ask because of the "neither [is] fun or profitable" comment...) – Mark Adelsberger May 17 '17 at 16:10
  • Replacements work pretty well for *one* repository; the problem is that they don't transfer by default (on new clones) and people try to use them to paper over things that *require* them. The big flag day is a big pain for everyone, but it's a one-time pain. The replacement method is a smaller pain but it's ongoing. :-) – torek May 17 '17 at 16:38
  • I was hoping i could avoid rewriting the entire repo but since this method "cements replacements" i think this is by far the best option. – ZanYzer May 18 '17 at 09:58
2

From here the next steps would be to (1) check out the main branch and merge in the changes, (2) delete the temporary local branch, then (3) push the changes to the remote repo.

It seems you misunderstand what git replace really did. There is nothing to merge, because the true history isn't changed in any way by git replace. Rather, replace makes a note off to the side that says "by default, when browsing the history, if you find this object, substitute this one instead". You actually can still see the real history, e.g. git --no-replace-objects log.

So replace creates the illusion of a rewritten history. In that it isn't a true rewrite and therefore doesn't create an "upstream rebase" situation for other developers, this is pretty cool. OTOH it cannot be trusted as a way to scrub sensitive data from the repo, since the rewrite really is just an illusion. And the output you get from git commands can be misleading, in that it can imply that the "real object" SHA ID is associated with the "replacement object" content (when in fact it's essentially certain that said content would not hash to said SHA).

What you really need to do if you decide to go ahead and share the replacement with origin is

git push origin refs/replace/*

Be aware that there are a few known bugs/quirks, and the documentation suggests that there may be unknown bugs/quirks.

Mark Adelsberger
  • 42,148
  • 4
  • 35
  • 52
  • I knew i would be creating an illusion but your're right, i didn't have a complete understanding of what `git replace` did. These steps came directly from the answer i linked to in my question. – ZanYzer May 18 '17 at 09:53
0

Note: you will need to make sure the server allows it: a new configuration variable core.usereplacerefs has been added with Git 2.19 (Q3 2018), primarily to help server installations that want to ignore the replace mechanism altogether.

See commit da4398d, commit 6ebd1ca, commit 72470aa (18 Jul 2018) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 1689c22, 15 Aug 2018)

add core.usereplacerefs config option

We can already disable replace refs using a command line option or environment variable, but those are awkward to apply universally. Let's add a config option to do the same thing.

That raises the question of why one might want to do so universally. The answer is that replace refs violate the immutability of objects. For instance, if you wanted to cache the diff between commit XYZ and its parent, then in theory that never changes; the hash XYZ represents the total state.
But replace refs violate that; pushing up a new ref may create a completely new diff.

The obvious "if it hurts, don't do it" answer is not to create replace refs if you're doing this kind of caching.
But for a site hosting arbitrary repositories, they may want to allow users to share replace refs with each other, but not actually respect them on the site (because the caching is more important than the replace feature).


On the security side, Git 2.41 (Q2 2023) allows Git forges to disable replace-refs feature while running git merge-tree.

See commit b6551fe (10 May 2023) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 80754c5, 15 May 2023)

merge-tree: load default git config

Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee

The 'git merge-tree'(man) command handles creating root trees for merges without using the worktree.
This is a critical operation in many Git hosts, as they typically store bare repositories.

This builtin does not load the default Git config, which can have several important ramifications.

In particular, one config that is loaded by default is core.useReplaceRefs.
This is typically disabled in Git hosts due to the ability to spoof commits in strange ways.

Since this config is not loaded specifically during merge-tree, users were previously able to use refs/replace/ references to make pull requests that looked valid but introduced malicious content.
The resulting merge commit would have the correct commit history, but the malicious content would exist in the root tree of the merge.

The fix is simple: load the default Git config in cmd_merge_tree().
This may also fix other behaviors that are effected by reading default config.
The only possible downside is a little extra computation time spent reading config.
The config parsing is placed after basic argument parsing so it does not slow down usage errors.


Git 2.42 (Q3 2023) introduces a mechanism to disable replace refs globally and per repository.

See commit 9c7d1b0, commit f117838, commit d24eda4 (06 Jun 2023) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit d9f9f6b, 22 Jun 2023)

repository: create read_replace_refs setting

Signed-off-by: Derrick Stolee

The 'read_replace_refs' global specifies whether or not we should respect the references of the form 'refs/replace/' to replace which object we look up when asking for ''.
This global has caused issues when it is not initialized properly, such as in b6551fe ("merge-tree: load default git config(man)", 2023-05-10, Git v2.41.0-rc0 -- merge).

To make this more robust, move its config-based initialization out of git_default_config and into prepare_repo_settings().
This provides a repository-scoped version of the 'read_replace_refs' global.

The global still has its purpose: it is disabled process-wide by the GIT_NO_REPLACE_OBJECTS environment variable or by a call to disable_replace_refs() in some specific Git commands.

Since we already encapsulated the use of the constant inside replace_refs_enabled(), we can perform the initialization inside that method, if necessary.
This solves the problem of forgetting to check the config, as we will check it before returning this value.

Due to this encapsulation, the global can move to be static within replace-object.c.

There is an interesting behavior change possible here: we now have a repository-scoped understanding of this config value.
Thus, if there was a command that recurses into submodules and might follow replace refs, then it would now respect the core.useReplaceRefs config value in each repository.

'git grep --recurse-submodules'(man) is such a command that recurses into submodules in-process.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250