3

In my 'project/repo' I have two MS Visual Studio projects, one for the main code, and an independent one for tests. I have some files that are common to both (in the copy and paste sense) and I'd like to see / check which ones they are.

What is the right Git commands (or Gui menu clicks) to see if I have used the same content blob twice in the overall repo tree? If I have read all the tutorials correctly, git should have a single SHA1 for the two copies of the same file content and already know about it. I am hoping Git has a command that finds and displays these duplicate usage file paths.

Eventually I'd like to be able to find out the diffs between the versions when there is a common ancestor blob SHA1 (but not a common location). [i.e. during testing one version gets updated ahead of the other version...]

I know it isn't best practice to have such duplicates, but it is the way the work has ended up :-(

I have Msysgit and GitExtensions on windows...

Philip Oakley
  • 13,333
  • 9
  • 48
  • 71

1 Answers1

7

You can do something like

git ls-tree -r HEAD

To see the blobs and the files.

If you don't want to manually look which are the same blobs:

git ls-tree -r HEAD |
    sort -t ' ' -k 3 |
        perl -ne '$1 && / $1\t/ && print "\e[0;31m" ; / ([0-9a-f]{40})\t/; print "$_\e[0m"'

From: Git: Find duplicate blobs (files) in this tree

Community
  • 1
  • 1
manojlds
  • 290,304
  • 63
  • 469
  • 417
  • I'll have a try.. I think I see it.. I get the duplicates showing in red in a sorted long list of all the items, when I paste those three lines into Git Bash (opened at the top of the repo directory in Windows) – Philip Oakley Apr 25 '11 at 21:33
  • I'm sort of surprised it isn't available as a short command or gui action. I expect lots of folks start 'here' (as in "if I were you I wouldn't start from here") – Philip Oakley Apr 25 '11 at 21:39