-1

One of my teammates created a java file by copying from another file. Guess, what the new file looks like an edit over the original file in git repository . The two files are completely different and I want the new file to have its own history not copied history. The pull request shows the file is copied and has history of changes associated with it. Does anyone know how to remove this history associated with copying? The is wanted behavior done through git

Dan Hunex
  • 5,172
  • 2
  • 27
  • 38

1 Answers1

2

In Git, files do not have history. History is attached to commits: commits have parent commits, which have more parents, and so on. Each of these commits has an ID (a "true name" SHA-1 hash), and author and committer (name, email, time-stamp), and log message as well.

Of course, everyone wants to see file history. So Git fakes it up: it compares one historical commit to another newer commit, diffing all the files in each commit. When it diffs the files, it may decide that the earlier commit's dir/README.txt was copied or renamed to the new commit's python/sourcefile.py. It will therefore tell you that the "history" of python/sourcefile.py is that it was dir/README.txt before.

That history is completely imaginary, unless you don't want it to be, in which case, it is the utter, absolute truth! It's your decision whether to believe Git.

You also get some control knobs:

  • diff.renames: if true, ask Git to detect renamed files. If false, ask Git not to detect renames. If copies, ask Git to detect copied files as well as renamed files. The default is now true but was false in older (pre-2.9) versions of Git.

  • diff.renameLimit: how many files should Git treat as candidates for rename detection. Rename and copy detection is somewhat slow and memory-intensive so Git used to have a limit of 500, then 1000, then 2000, as its defaults here. Setting this to 0 means "as many as Git can manage."

  • diff.algorithm (and many external diff driver configuration options): see the git config documentation.

Besides these, there are command-line switches for the same items, plus a "match threshold" (-M) value that defaults to "50% similar": files at least "50% similar" were renamed, files less than "50% similar" were not.

(Note that some commands, such as git merge, run their internal git diff-s with some of these settings set differently from their defaults. There are controls for these as well.)

If the file has the same pathname

In your case, it sounds like what happened was:

  • An older commit, call it $OLD, had file src/foo.java.
  • Somewhere along the way src/foo.java was removed entirely (and committed, or not committed, it won't matter here).
  • Then later someone wrote a new, or copied something else to, src/foo.java, and committed; call this $NEW.
  • Now comparing $OLD to $NEW insists on comparing the unrelated files.

Here you are semi-stuck: Git believes that since src/foo.java and src/foo.java have the same name, it should compare their contents. However, there's one more control knob, as git diff has a -B switch:

-B[<n>][/<m>], --break-rewrites[=[<n>][/<m>]]

       Break complete rewrite changes into pairs of delete and create. This serves two purposes:

       It affects the way a change that amounts to a total rewrite of a file not as a series of deletion and insertion mixed together with a very few lines that happen to match textually as the context, but as a single deletion of everything old followed by a single insertion of everything new, and the number m controls this aspect of the -B option (defaults to 60%). -B/70% specifies that less than 30% of the original should remain in the result for Git to consider it a total rewrite (i.e. otherwise the resulting patch will be a series of deletion and insertion mixed together with context lines).

       When used with -M, a totally-rewritten file is also considered as the source of a rename (usually -M only considers a file that disappeared as the source of a rename), and the number n controls this aspect of the -B option (defaults to 50%). -B20% specifies that a change with addition and deletion compared to 20% or more of the file's size are eligible for being picked up as a possible source of a rename to another file.

So, by supplying -B and turning on rename and copy detection, you may get the output you want.

torek
  • 448,244
  • 59
  • 642
  • 775