One of my teammates created a java file by copying from another file. Guess, what the new file looks like an edit over the original file in git repository . The two files are completely different and I want the new file to have its own history not copied history. The pull request shows the file is copied and has history of changes associated with it. Does anyone know how to remove this history associated with copying? The is wanted behavior done through git
-
Can you give the exact commands to reproduce this? – merlin2011 Jul 12 '16 at 23:56
-
Copy a file from from folder /src/somepackage to src/anotherpage and rename the file. It will detect that it is a copy and will have history – Dan Hunex Jul 13 '16 at 00:05
-
Potential Duplicate: http://stackoverflow.com/q/15031576/2570538 – srage Jul 13 '16 at 00:08
-
I tried exactly this, and could not reproduce the issue. – merlin2011 Jul 13 '16 at 00:08
1 Answers
In Git, files do not have history. History is attached to commits: commits have parent commits, which have more parents, and so on. Each of these commits has an ID (a "true name" SHA-1 hash), and author and committer (name, email, time-stamp), and log message as well.
Of course, everyone wants to see file history. So Git fakes it up: it compares one historical commit to another newer commit, diffing all the files in each commit. When it diffs the files, it may decide that the earlier commit's dir/README.txt
was copied or renamed to the new commit's python/sourcefile.py
. It will therefore tell you that the "history" of python/sourcefile.py
is that it was dir/README.txt
before.
That history is completely imaginary, unless you don't want it to be, in which case, it is the utter, absolute truth! It's your decision whether to believe Git.
You also get some control knobs:
diff.renames
: iftrue
, ask Git to detect renamed files. Iffalse
, ask Git not to detect renames. Ifcopies
, ask Git to detect copied files as well as renamed files. The default is nowtrue
but wasfalse
in older (pre-2.9) versions of Git.diff.renameLimit
: how many files should Git treat as candidates for rename detection. Rename and copy detection is somewhat slow and memory-intensive so Git used to have a limit of 500, then 1000, then 2000, as its defaults here. Setting this to 0 means "as many as Git can manage."diff.algorithm
(and many external diff driver configuration options): see thegit config
documentation.
Besides these, there are command-line switches for the same items, plus a "match threshold" (-M
) value that defaults to "50% similar": files at least "50% similar" were renamed, files less than "50% similar" were not.
(Note that some commands, such as git merge
, run their internal git diff
-s with some of these settings set differently from their defaults. There are controls for these as well.)
If the file has the same pathname
In your case, it sounds like what happened was:
- An older commit, call it
$OLD
, had filesrc/foo.java
. - Somewhere along the way
src/foo.java
was removed entirely (and committed, or not committed, it won't matter here). - Then later someone wrote a new, or copied something else to,
src/foo.java
, and committed; call this$NEW
. - Now comparing
$OLD
to$NEW
insists on comparing the unrelated files.
Here you are semi-stuck: Git believes that since src/foo.java
and src/foo.java
have the same name, it should compare their contents. However, there's one more control knob, as git diff
has a -B
switch:
-B[<n>][/<m>], --break-rewrites[=[<n>][/<m>]]
Break complete rewrite changes into pairs of delete and create. This serves two purposes:
It affects the way a change that amounts to a total rewrite of a file not as a series of deletion and insertion mixed together with a very few lines that happen to match textually as the context, but as a single deletion of everything old followed by a single insertion of everything new, and the number m controls this aspect of the
-B
option (defaults to60%
).-B/70%
specifies that less than 30% of the original should remain in the result for Git to consider it a total rewrite (i.e. otherwise the resulting patch will be a series of deletion and insertion mixed together with context lines).When used with
-M
, a totally-rewritten file is also considered as the source of a rename (usually-M
only considers a file that disappeared as the source of a rename), and the number n controls this aspect of the-B
option (defaults to 50%).-B20%
specifies that a change with addition and deletion compared to 20% or more of the file's size are eligible for being picked up as a possible source of a rename to another file.
So, by supplying -B
and turning on rename and copy detection, you may get the output you want.

- 448,244
- 59
- 642
- 775