Git copy file preserving history

Question

I have a somewhat confusing question in Git. Lets say, I have a file dir1/A.txt committed and git preserves a history of commits

Now I need to copy the file into dir2/A.txt (not move, but copy). I know that there is a git mv command but I need dir2/A.txt to have the same history of commits as dir1/A.txt, and dir1/A.txt to still remain there.

I'm not planning to update A.txt once the copy is created and all the future work will be done on dir2/A.txt

I know it sounds confusing, I'll add that this situation is on java based module (mavenized project) and we need to create a new version of code so that our customers will have the ability to have 2 different versions in runtime, the first version will be removed eventually when the alignment will be done. We can use maven versioning of course, I'm just newbie to Git and curious about what Git can provide here.

`git cp` does NOT work according to my 2022 experiement :( – Sridhar Sarnobat Apr 07 '22 at 22:49 — Sridhar Sarnobat, Apr 07 '22 at 22:49

score 235 · Answer 1 · edited Nov 01 '19 at 21:08

235

All you have to do is:

move the file to two different locations,
merge the two commits that do the above, and
move one copy back to the original location.

You will be able to see historical attributions (using git blame) and full history of changes (using git log) for both files.

Suppose you want to create a copy of file foo called bar. In that case the workflow you'd use would look like this:

git mv foo bar
git commit

SAVED=`git rev-parse HEAD`
git reset --hard HEAD^
git mv foo copy
git commit

git merge $SAVED     # This will generate conflicts
git commit -a        # Trivially resolved like this

git mv copy foo
git commit

Why this works

After you execute the above commands, you end up with a revision history that looks like this:

( revision history )            ( files )

    ORIG_HEAD                      foo
     /     \                      /   \
SAVED       ALTERNATE          bar     copy
     \     /                      \   /
      MERGED                     bar,copy
        |                           |
     RESTORED                    bar,foo

When you ask Git about the history of foo, it will:

detect the rename from copy between MERGED and RESTORED,
detect that copy came from the ALTERNATE parent of MERGED, and
detect the rename from foo between ORIG_HEAD and ALTERNATE.

From there it will dig into the history of foo.

When you ask Git about the history of bar, it will:

notice no change between MERGED and RESTORED,
detect that bar came from the SAVED parent of MERGED, and
detect the rename from foo between ORIG_HEAD and SAVED.

From there it will dig into the history of foo.

It's that simple. :)

You just need to force Git into a merge situation where you can accept two traceable copies of the file(s), and we do this with a parallel move of the original (which we soon revert).

edited Nov 01 '19 at 21:08

Maxim Belkin

138
1
6

answered May 18 '17 at 00:31

Peter Dillinger

2,757
2
14
7

I think this is solid! My only suggestion would be to use a branch instead of an environment variable to keep track of the first commit. True, it means deleting the branch after the merge. But for one, it looks better. Also, there is a slight chance that a purge could be run on the git repo between the time you reset and commit the merge. Using a branch removes that possibility. – John Chesshir May 31 '17 at 02:56
11

This doesn't seem to work, at least not with git `2.9`. I must use `--follow` or `-C` flags in order for git to trace `bar` to its `foo` origins. `cp foo bar && git add bar && git commit` gives the same end result without the weird history. Am I doing someting wrong? – stefanmaric Jun 16 '17 at 23:54
4

@peter-dillinger, great workaround! I have made it more readable in https://stackoverflow.com/a/46484848/1389680. – Robert Pollak Sep 29 '17 at 08:36
1

This works an absolute treat: @RobertPollak's abbreviated derived instructions offer the TLDR for this, tx – ptim Nov 11 '17 at 12:08
1

Is there an advantage to this method vs this other answer: https://stackoverflow.com/a/44566552/3195477 ? The other one appears to be much simpler to enact, but is it less useful for some reason? – StayOnTarget Sep 26 '18 at 12:08
66

This is an ingenious solution, but it is an interesting use of the word 'simple' to describe as such this ten-step workaround for the absence of what should be an atomic action in any system intended to track the history of legitimately copyable things. – sdenham Dec 18 '18 at 00:09
Very cool. Here's the script version that can be used from a terminal: https://stackoverflow.com/a/53849613/521799 – Lukas Eder Dec 19 '18 at 13:24
26

Beware of this method if you anticipate ever wanting/needing to use git rebase on these commits. When I tried this method of preserving history, git viewed the commits made by this method as conflicting with eachother during a rebase and needed to be manually merged. The conflict resolution processes ended up losing the commit history that I was attempting to save in the first place. – zwalker Feb 12 '19 at 20:18
Works on Windows as well (git 2.20.1 bash). – Igor Mar 04 '19 at 18:14
4

I remember this worked for me in past. But currently it doesn't. The file which comes from the merged branch gets its "starting point" in history from the merge commit. Tried on a few GIT versions, including 2.24.0, on Windows 7. Tried using the script from @LukasEder too. Same result. – volvpavl Nov 06 '19 at 13:40
Do you know of some git plumbing commands that we could use to specify the parent of the file at the git object level? – Natim Mar 20 '20 at 10:18
2

For the sake of completeness, I believe the approach you've described is also documented in this article: [How to duplicate a file while preserving git line history](https://devblogs.microsoft.com/oldnewthing/20190919-00/?p=102904) – Bass Mar 31 '20 at 12:06
2

Thank you, @Bass, the link you gave shows an even better solution that the one shown here! I have incorporated it into my "more readable" answer at https://stackoverflow.com/a/46484848/1389680 . – Robert Pollak May 19 '20 at 13:48
@volvpavl Have you perhaps found a solution since that works? – P Varga Jul 25 '20 at 05:00
This worked for me with git 2.27 – TonyH Sep 15 '20 at 16:31
You only need one of the moves, not the second one. When you do the merge, don't commit it, and resurrect the original file. – Clément Dec 04 '20 at 19:21
Is this solution working for anyone? I haven't been able to use it as it seems history doesn't seem to be preserved in the new file – MichaelGofron Mar 11 '22 at 19:26
2

There is an improved version of this at https://stackoverflow.com/a/46484848/695591 , which does not generate a merge conflict and hence can be rebased after the fact. – Clément Jun 22 '22 at 23:56

score 94 · Accepted Answer · edited Sep 24 '22 at 08:23

94

Unlike subversion, git does not have a per-file history. If you look at the commit data structure, it only points to the previous commits and the new tree object for this commit. No explicit information is stored in the commit object which files are changed by the commit; nor the nature of these changes.

The tools to inspect changes can detect renames based on heuristics. E.g. git diff has the option -M that turns on rename detection. So in case of a rename, git diff might show you that one file has been deleted and another one created, while git diff -M will actually detect the move and display the change accordingly (see man git diff for details).

So in git this is not a matter of how you commit your changes but how you look at the committed changes later.

edited Sep 24 '22 at 08:23

Daniel Böhmer

14,463
5
36
46

answered Jun 05 '13 at 10:43

CliffordVienna

7,995
1
37
57

8

My reproducible example on http://pastebin.com/zEREyeaL shows that `git blame` also knows the rename history - without using any option. Doesn't this tell us that the history is stored in some way? – Daniel Alder Apr 26 '14 at 11:47
9

@DanielAlder No. Like `git diff -M` this is just smart analysis of the tree objects. From the `git blame` man page: "The origin of lines is automatically followed across whole-file renames (currently there is no option to turn the rename-following off)." – CliffordVienna Apr 26 '14 at 11:59
28

Why does `git mv` exist then? – skirsch Jul 09 '18 at 09:24
4

@skirsch convenience – CliffordVienna Jul 10 '18 at 11:29
13

And unlike Mercurial as well. Mercurial has history preserving copy. – Omnifarious Sep 15 '18 at 01:18

score 40 · Answer 3 · answered Jun 15 '17 at 11:39

40

Simply copy the file, add and commit it:

cp dir1/A.txt dir2/A.txt
git add dir2/A.txt
git commit -m "Duplicated file from dir1/ to dir2/"

Then the following commands will show the full pre-copy history:

git log --follow dir2/A.txt

To see inherited line-by-line annotations from the original file use this:

git blame -C -C -C dir2/A.txt

Git does not track copies at commit-time, instead it detects them when inspecting history with e.g. git blame and git log.

Most of this information comes from the answers here: Record file copy operation with Git

answered Jun 15 '17 at 11:39

Jakob Buron

1,196
11
19

1

This is not very useful because `-C -C -C` searches *the entire repo* which is just incredibly slow unless your repo is tiny. – Timmmm Jul 12 '22 at 13:45
Can I do `cp dir1/A.txt dir2/B.txt` where file name is changed? – alper Dec 07 '22 at 18:16

score 28 · Answer 4 · edited Apr 05 '19 at 12:28

28

I've slightly modified Peter's answer here to create a reusable, non-interactive shell script called git-split.sh:

#!/bin/sh

if [[ $# -ne 2 ]] ; then
  echo "Usage: git-split.sh original copy"
  exit 0
fi

git mv "$1" "$2"
git commit -n -m "Split history $1 to $2 - rename file to target-name"
REV=`git rev-parse HEAD`
git reset --hard HEAD^
git mv "$1" temp
git commit -n -m "Split history $1 to $2 - rename source-file to temp"
git merge $REV
git commit -a -n -m "Split history $1 to $2 - resolve conflict and keep both files"
git mv temp "$1"
git commit -n -m "Split history $1 to $2 - restore name of source-file"

edited Apr 05 '19 at 12:28

Radon8472

4,285
1
33
41

answered Dec 19 '18 at 10:49

Lukas Eder

211,314
129
689
1,509

3

great solution. I had problems to use it with files containing spaces, I modified your code to fix this. – Radon8472 Apr 05 '19 at 11:10
hm, with git `v2.17.1` this leave me with a newly commited file `$2` - does this approach still work for you? – frans Nov 25 '20 at 13:50
hm, even with `v2.1.4` log file of copied file is empty.. – frans Nov 25 '20 at 14:17
@frans: It worked at the time. Feel free to edit with improvements if you see any... – Lukas Eder Nov 25 '20 at 18:20
Didn't find a solution yet. I'm afraid this approach does not work any more. – frans Nov 26 '20 at 10:59
@frans: Is this a git problem or a shell problem? – Lukas Eder Nov 26 '20 at 11:29
Seems to be git - after a merge one of the 'new' files loses it's history – frans Nov 26 '20 at 12:33
Following this I ended up with 4 commits (commit 2, commit 1, merge commit, commit 4). But I was able (git 2.30.1) squashing with a rebase. It's possible a merge --squash would reduce the number of commits in this dance. – Jason May 17 '22 at 18:55
Suggestion: I think it should exit non-zero in case it's just showing the help because of receiving the wrong number of arguments. – Raúl Salinas-Monteagudo May 30 '22 at 14:02
I've modified it a little bit for accepting more than two arguments (when you want to split the original file in several files) https://gist.github.com/Javrd/295a93a74fd78b7aaa93954fd7bfec03 – javrd Apr 26 '23 at 13:30

score 10 · Answer 5 · answered Jan 29 '14 at 11:29

10

For completeness, I would add that, if you wanted to copy an entire directory full of controlled AND uncontrolled files, you could use the following:

git mv old new
git checkout HEAD old

The uncontrolled files will be copied over, so you should clean them up:

git clean -fdx new

answered Jan 29 '14 at 11:29

Hervé

275
4
5

3

As far as I can see, the first commands will *not copy* uncontrolled files (but move them), and what's the point of moving them if you remove them with the 'clean' command afterwards? – hans_meine Aug 19 '14 at 05:24
@hans_meine you're right, one might as well clean first and move after. – Hervé Jan 30 '16 at 21:09
15

Only the original file(s) stays connected to history when I do this, the copy is considered a new file with a fresh history. This doesn't answer the question :( – Griknok Oct 13 '16 at 00:48

score 2 · Answer 6 · answered May 16 '17 at 18:47

In my case, I made the change on my hard drive (cut/pasted about 200 folders/files from one path in my working copy to another path in my working copy), and used SourceTree (2.0.20.1) to stage both the detected changes (one add, one remove), and as long as I staged both the add and remove together, it automatically combined into a single change with a pink R icon (rename I assume).

I did notice that because I had such a large number of changes at once, SourceTree was a little slow detecting all the changes, so some of my staged files look like just adds (green plus) or just deletes (red minus), but I kept refreshing the file status and kept staging new changes as they eventually popped up, and after a few minutes, the whole list was perfect and ready for commit.

I verified that the history is present, as long as when I look for history, I check the "Follow renamed files" option.

score 0 · Answer 7 · answered Sep 18 '19 at 14:45

This process preserve history, but is little workarround:

# make branchs to new files
$: git mv arquivos && git commit

# in original branch, remove original files
$: git rm arquivos && git commit

# do merge and fix conflicts
$: git merge branch-copia-arquivos

# back to original branch and revert commit removing files
$: git revert commit

Git copy file preserving history

7 Answers7

Why this works

Linked