182

When I move a file in git using git-mv the status shows that the file has been renamed and even if I alter some portions it still considers to be almost the same thing (which is good because it lets me follow the history of it).

When I copy a file the original file has some history I'd like to associate with the new copy.

I have tried moving the file then trying to re-checkout in the original location - once moved git won't let me checkout the original location.

I have tried doing a filesystem copy and then adding the file - git lists it as a new file.

Is there any way to make git record a file copy operation in a similar way to how it records a file rename/move where the history can be traced back to the original file?

Hexdoll
  • 2,006
  • 2
  • 15
  • 9

3 Answers3

128

If for some reason you cannot turn on copy detection as in Jakub Narębski's answer, you can force Git to detect the history of the copied file in three commits:

  • Instead of copying, switch to a new branch and move the file to its new location there.
  • Re-add the original file there.
  • Merge the new branch to the original branch with the no-fast-forward option --no-ff.

Credits to Raymond Chen. What follows is his procedure. Say the file is named SomeFile.cpp, and you want the duplicate to be named SomeOtherFile.cpp:

origFile=SomeFile.cpp
copyName=SomeOtherFile.cpp
branchName=duplicate-SomeFile

git checkout -b $branchName # create and switch to branch

git mv $origFile $copyName # make the duplicate
git commit -m "duplicate $origFile to $copyName"

git checkout HEAD~ $origFile # bring back the original
git commit -m "restore duplicated $origFile"

git checkout - # switch back to source branch
git merge --no-ff $branchName -m "Merge branch $branchName" # merge dup into source branch

Note that this can be executed on Windows in Git Bash.


2020-05-19: The above solution has the advantages of not changing the log of the original file, not creating a merge conflict, and being shorter. The former solution had four commits:

  • Instead of copying, switch to a new branch and move the file to its new location there.
  • Switch to the original branch and rename the file.
  • Merge the new branch into the original branch, resolving the trivial conflict by keeping both files.
  • Restore the original filename in a separate commit.

(Solution taken from https://stackoverflow.com/a/44036771/1389680.)

Robert Pollak
  • 3,751
  • 4
  • 30
  • 54
  • 10
    Simplicity, brevity, 100%... This answer is public service... upvoting everything in sight – ptim Nov 11 '17 at 12:06
  • 1
    What the difference between `move` and `rename`? – vovan Mar 28 '19 at 17:39
  • @vovan Are you referring to the fact that in bash you would use `mv` for both operations? I was using 'move' for the case that may involve changing the file's directory, and 'rename' for there case where it doesn't. – Robert Pollak Mar 29 '19 at 09:27
  • 5
    I tried to follow this (new) recipe and it didn't work. It might help if you showed the actual commands. – Greg Lindahl Jul 22 '20 at 07:15
  • 3
    @RobertPollak I have tried various versions of this but they didn't work. By "move the file", do you mean `git mv orig new`? By "readd the original", do you mean `cp new orig && git add orig`? – P Varga Jul 25 '20 at 06:00
  • @GregLindahl, the linked blog entry by Raymond Chen gives the actual commands. I consider this too much detail here. – Robert Pollak Aug 17 '20 at 13:34
  • @ᆼᆺᆼ, Yes, that's what I meant by 'move' and 'readd'. – Robert Pollak Aug 17 '20 at 13:50
  • So I've tried and this doesn't work for me... one of the files will have its history begin at the point it was `git mv`'d, even if both were `git mv`'d on different branches and then those branches merged together with `--no-ff` – P Varga Aug 18 '20 at 09:11
  • @ᆼᆺᆼ, do the commands given in Raymond Chen's blog post work for you? – Robert Pollak Aug 18 '20 at 10:01
  • 1
    @RobertPollak No, and I've also tried `git cp` from `git-extras`, same result... Is it possible these methods stopped working with a certain git release? – P Varga Aug 19 '20 at 09:49
  • @ᆼᆺᆼ, this could be. Which Git version do you use? The current release is 2.28.0 from 2020-07-27. I have successfully used Raymond Chen's method in one of my projects yesterday with git 2.20 (from current stable Debian release 10 "buster"). Unfortunately, I currently don't have time for more testing. – Robert Pollak Aug 19 '20 at 10:47
  • @GregLindahl, which Git version did you use? – Robert Pollak Aug 19 '20 at 10:53
  • I am using git `2.28.0.windows.1` (the most recent version today) and the commands did not work. The merge just deletes the original files, just applying the `git mv` command... is there any default settings of Git that could make it fail? – ymoreau Sep 25 '20 at 14:54
  • Can someone with problems please reproduce them in a fresh repo, then post the corresponding command history? – Robert Pollak Sep 25 '20 at 15:42
  • @ymoreau Does the four-commit version work for you? – Robert Pollak Sep 25 '20 at 15:43
  • I did not try, the history was cut by older git-mv anyway (while they were actually simple move), so that confirmed something I read elsewhere about git not recording the mv anyway, and I dropped it. – ymoreau Sep 28 '20 at 07:27
  • 10
    The idea that you have to create a branch, commit there, then merge that back ... and the top comment is "Simplicity and brevity" ... is mind boggling. In Mercurial you'd just do `hg cp` (or `hg cp -A` if you'd already copied it yourself). What a shame git won the VCS popularity contest. – Arthur Tacca Feb 06 '21 at 11:07
  • @ArthurTacca, feel free to use whatever you want. For me, Git also won the functionality contest. I thoroughly tested both Git and Mercurial before switching from SVN. And let me point out the alternative solution of simply using `--find-copies-harder` instead of crafting these commits. – Robert Pollak Feb 07 '21 at 11:36
  • @JohnK Ok, nice, well, it's not nice, but real ugly (thanx to Linus), anyway that works for a single file, so that's nice =) But what if I'm refactoring some ugly legacy code and have to split one file, that contains X classes into X different files.. what then? – Eddy Shterenberg May 11 '21 at 00:54
  • @EddyShterenberg What have you tried? What was the outcome? – Robert Pollak May 11 '21 at 09:56
  • @RobertPollak, I've tried the exact solution, that posted in the answer, i.e. create branch "dup", rename SINGLE file+commit, restore original file+commit, merge to src branch with --no-ff. My question is: if I have to split file into, say 5 files, then following this scenario I should repeat this 5 times. So, since I'm a developer - I'm lazy and trying to find more convenient/easy way to do that. P.S. google didn't help much. Also it would be nice to have the split as a single commit in the history. In TFS I just branch the file 5 times, clean the 5 copies and check-in once. – Eddy Shterenberg May 11 '21 at 21:26
  • @EddyShterenberg Have you also tried using `-C` (with a single split commit) instead? – Robert Pollak May 12 '21 at 14:14
  • @RobertPollak, isn't `-C` an option of `log` and `blame` commands? I saw it can multiply over and over - . I would like to see the history line of any random file in my repository in a regular history view (i.e. UI of some kind, be it Git GUI/TortoiseGit/Visual Studio/What Ever) without having second thoughts, like "hey, may be that file was split from another file, so I'll just switch from my IDE to console and check it". If it's not possible (while staying sane) it's an acceptable fact too, then I'll just stop googling and accept the reality =) – Eddy Shterenberg May 12 '21 at 21:51
  • @RobertPollak I was able to carry out these steps and everything worked as advertised. However, what I'm noticing now is that both resulting files not only share past commit history (which we wanted), but new commits on either of the files shows up in the commit history of the other file. That is, going forward, these files will have identical histories, including new commits, even though the goal is for them to diverge. Have you encountered this and if so did you find a workaround? Thanks! – sparc_spread Sep 01 '21 at 21:34
  • a solution that was taken from the issue that is a duplicate of this one XD, nice – Maxwell s.c Oct 08 '21 at 15:34
  • :-) This was asked earlier, so the other got duplicated. – Robert Pollak Oct 13 '21 at 08:13
  • 2
    After I finished these steps I got the same issue as if I had just done `cp oldFile newFile` it showed the file had just been added directly without any relation with the old file. Any ideas why that might be the case? – MichaelGofron Mar 11 '22 at 01:18
  • @MichaelGofron you should see the original file with full history, but the 2nd one as renamed. You will not see a file history directly on renamed files (without following renames). Blame should work fine on both files by default. – Nux Jun 11 '23 at 18:55
126

Git does not do rename tracking nor copy tracking, which means it doesn't record renames or copies. What it does instead is rename and copy detection. You can request rename detection in git diff (and git show) by using the -M option, you can request additional copy detection in changed files by using the -C option instead, and you can request more expensive copy detection among all files with -C -C. See the git-diff manpage.

-C -C implies -C, and -C implies -M.

-M is a shortcut for --find-renames, -C means --find-copies and -C -C can also be spelled out as --find-copies-harder.

You can also configure git to always do rename detection by setting diff.renames to a boolean true value (e.g. true or 1), and you can request git to do copy detection too by setting it to copy or copies. See the git-config manpage.

Check also the -l option to git diff and the related config variable diff.renameLimit.


Note that git log <pathspec> works differently in Git: here <pathspec> is set of path delimiters, where path can be a (sub)directory name. It filters and simplifies history before rename and copy detection comes into play. If you want to follow renames and copies, use git log --follow <filename> (which currently is a bit limited, and works only for a single file).

Robert Pollak
  • 3,751
  • 4
  • 30
  • 54
Jakub Narębski
  • 309,089
  • 65
  • 217
  • 230
  • 2
    @allyourcode: What you are confused about? To turn on copy detection by default you set `diff.renames` to `copies` (e.g. '`git config diff.renames copies`'). I agree that it is a bit counterintuitive. – Jakub Narębski Jul 25 '10 at 20:51
  • One section I can't seem to parse is "and you can request to do by default also rename detection". Are you saying there's four values that diff.renames can use (true, 1, copy, copies), and that they all do the same thing? – allyourcode Jul 26 '10 at 02:43
  • 1
    @allyourcode: I'm sorry, I haven't noticed this. Fixed now, thanks. – Jakub Narębski Jul 26 '10 at 07:15
  • Ok, so Git does not record renames or copies. Now I am also interested in the question whether Git then stores everything duplicated, or whether it uses an intelligent de-duplication algorithm based on file-part hashes or similar - so that the data that was copied is stored only once in the repository? – peschü Nov 23 '14 at 11:48
  • 4
    @peschü: Git uses content-addressed object database as a repository storage. File contents is stored in 'blob' contents under address that is SHA-1 hash of contents (well, type+length+contents). This means that given contents is stored only once. Nb. this automatic deduplication was the reason behind creating "bup" backup system, using git pack format. – Jakub Narębski Nov 23 '14 at 15:36
  • 3
    Unlike the solution below, this doesn't work with change tracking in a range. Git log allows a range argument (`git log -L123,456:file.xyz`) that properly follows renames, but not copies, and you can't pass --follow in that case; also, AFAICT, this doesn't work with git blame. – Clément Jan 10 '20 at 20:48
  • Unfortunately, the packaged `gitk` has no switch to activate `--find-copies-harder`, see https://stackoverflow.com/q/63939606/1389680 . – Robert Pollak Sep 25 '20 at 15:47
1

This builds on the answer from Robert.

For my use case, I needed to move several directories from one implementation to another (with all that entails for file include paths, unit tests, etc), and I found it challenging & time consuming to move each individual file.

My solution includes prompts for the the origin & destination paths.

My solution also deletes the temporary branch that was created for this purpose (if the script succeeds to the end).

Caveats:

  1. The script will attempt to make a new directory for the input you provide for the second prompt (the new destination).
  2. Both this and the original solution merge history into the CURRENT BRANCH. I suggest that you start with a new branch, or at least git stash save if you have any local modifications.
branchName=chore/temp/duplicate-file-history-by-script
currentBranchName="$(git branch --show-current)"

function copy_git_history() {
    targetToCopy=$1
    newDestination=$2

    echo "copying $targetToCopy to $newDestination and restoring it's history"

    git mv "$targetToCopy" "$newDestination"
    git commit -m "duplicating $targetToCopy to $newDestination to retain git history"

    git checkout HEAD~ "$targetToCopy"
    git commit -m "restoring moved file $targetToCopy to its original location"
}

### USER PROMPTS ###

echo "proceeding to copy files to current branch.  Please make sure you are prepared to have the current git branch modified: $currentBranchName"
# spacing to make things easier to read
printf "\n"

echo "Please enter the path to the file(s) you wish to duplicate, relative to $PWD"
read -r originalFileLoc

echo "Please enter the new path where you wish to copy the original file(s)"
read -r newFileLoc

### END: USER PROMPTS ###

# create the new branch to store the changes
git checkout -b $branchName

# create the duplicate file(s)
if [[ -d  "$originalFileLoc" ]]
then
    files="$originalFileLoc/*"
    echo "copying files from $originalFileLoc to $newFileLoc"
    mkdir -p "$newFileLoc"

    for file in $files
    do
      copy_git_history "$file" "$newFileLoc"
    done
else
  copy_git_history "$originalFileLoc" "$newFileLoc"
fi

# switch back to source branch
git checkout -
# merge the history back into the source branch to retain both copies
git merge --no-ff $branchName -m "Merging file history for copying $originalFileLoc to $newFileLoc"

# delete the branch we created for history tracking purposes
git branch -D $branchName
zedd45
  • 2,101
  • 1
  • 31
  • 34