3

Say that I have something similar to this when I run git ls-tree -r master:

100644 blob a450cb6b6371494ab4b3da450f6e7d543bfe3493    FooBar/readme.txt
100644 blob a339338d7ad5113740244e7f7d3cbb236cb47115    Foobar/readme.txt

How can I remove the second blob from this tree object?

I'm assuming that this can be done on POSIX systems by just doing a git rm Foobar/readme.txt. How would I do the same thing on Windows?

syvex
  • 7,518
  • 9
  • 43
  • 47
  • 1
    Clearly you have a case-sensitivity issue. The problem here is that these are two different files on POSIX systems, but "the same" file on your Windows system. That means this is the same as http://stackoverflow.com/questions/2528589/git-windows-case-sensitive-file-names-not-handled-properly – torek Mar 20 '12 at 17:12
  • 1
    It's similar, but I'm looking for a solution in the general sense as well. I think the trick is just to be able to manually modify the tree object inside of git. – syvex Mar 20 '12 at 17:37
  • You can't really "modify the tree", but you can construct a new commit that points to a *new* tree (which points to another tree, etc) that does not have the extra stuff. The easiest way, by far, is to clone the repo to a POSIX system and just use regular git commands, make the commit, then fetch the commit over to the Windows repo. – torek Mar 20 '12 at 17:52
  • 1
    I'm looking for something more along the lines of `git read-tree` and `git write-tree`. http://progit.org/book/ch9-2.html – syvex Mar 21 '12 at 14:26
  • I realized I might be able to simulate the Windows behavior on my Mac, which has case-preserving but otherwise case-insensitive file systems by default. I can do `git rm --cached FooBar/readme.txt` or `git rm --cached Foobar/readme.txt` and that removes one or the other correctly. Is the action on Windows different? – torek Mar 22 '12 at 01:37

2 Answers2

1

git filter-branch with --index-filter might work since you are operating on the index and not on the working tree. Try something like:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch Foobar/readme.txt' HEAD
rtn
  • 127,556
  • 20
  • 111
  • 121
  • Unfortunately it will try to remove which ever it thinks is the current one. So even if you try `git rm -f Foobar/readme.txt`, it will just remove FooBar/readme.txt instead. – syvex Mar 20 '12 at 16:44
  • Aaah but of course. Windoze and it's sucky file system. Trying to find a better answer for you, but I have a feeling this will be rather tricky. Can you even clone that repository onto windows without problems? – rtn Mar 20 '12 at 17:22
1

OK, so, I spent a little time and effort testing this out on MacOS, which has similar problems with case folding.

I don't know if all versions of git are "the same enough" and/or whether Windows git works the same, but this script actually does the trick, without having to get any deeper in git plumbing than ls-tree -r and cat-file and rm --cached.

The script is also only lightly tested. (Note: tabs are getting smashed, cmd-C/cmd-V pasted the tabs in but I had to indent for stackoverflow. So the file indentation is goofed up below ... too lazy to fix here.)

#! /bin/bash

usage()
{
cat << EOF
usage: $0 [-h] [-r] [branch]

-h: print usage help
-r: rename ALL colliding files to their hashes
EOF
}

DO_RENAME=false
while getopts "hr" opt; do
case $opt in
h) usage; exit 0;;
r) DO_RENAME=true;;
*) usage 1>&2; exit 1;;
esac
done
shift $(($OPTIND - 1))

case $# in
0) branch=HEAD;;
1) branch=$1;;
*) usage
esac

# literal tab, so that it's easily distinguished from spaces
TAB=$(printf \\t)

branch=$(git rev-parse --symbolic $branch) || exit

tempfile=$(mktemp -t git-casecoll)
trap "rm -f $tempfile; exit 0" 0
trap "rm -f $tempfile; exit 1" 1 2 3 15

# First, let's find out whether there *are* any file name
# case collisions in the tree.
git ls-tree -r $branch > $tempfile
nfiles=$(wc -l < $tempfile | sed 's/  *//g')
n2=$(sort "-t$TAB" -k2 -f -u $tempfile | wc -l | sed 's/  *//g')
if [ $nfiles -eq $n2 ]; then
echo no collisions found
exit 0
fi
echo "$(($nfiles - $n2)) collision(s) found"

# functions needed below

# decode git escapes in pathnames
decode_git_pathname()
{
local path="$1"
case "$path" in
\"*\")
    # strip off leading and trailing double quotes
    path=${path#\"}
    path=${path%\"}
    # change % into %%
    path=${path/\%/%%}
    # and then interpret backslashes with printf
    printf -- "$path";;
*)
    # not encoded, just print it as is
    printf %s "$path";;
esac
}

show_or_do_rename()
{
local mode=$1 path="$(decode_git_pathname "$2")" sha1=$3
local renamed_to="$(dirname "$path")/$sha1"
local ftype=${mode:0:2}

if [ $ftype != 10 ]; then
    echo "WARNING: I don't handle $ftype files ($mode $path) yet"
    return 1
fi
if $DO_RENAME; then
    # git mv does not work, but git rm --cached does
    git rm --cached --quiet "$path"
    rm -f "$path"
    git cat-file -p $sha1 > "$renamed_to"
    chmod ${mode:2} "$renamed_to"
    git add "$renamed_to"
    echo "renamed: $path => $renamed_to"
else
    if [ $ftype != 10 ]; then
    echo "# I don't handle extracting a $ftype file ($mode) yet"
    else
    echo will: mv "$path" "$renamed_to"
    fi
fi
}

# Now we have to find which ones they were, which is more difficult.
# We still want the sorted ones with case folded, but we don't want
# to remove repeats, instead we want to detect them as we go.
#
# Note that Dir/file collides with both dir/file and dir/File,
# so if we're doing rename ops, we'll rename all three.  We also
# don't know if we're in a collision-group until we hit the second
# entry, so the first time we start doing a collision-group, we
# must rename two files, and from then on (in the same group) we
# only rename one.
prevpath=""
prevlow=""
prevsha=
in_coll=false
sort -f $tempfile |
while IFS="$TAB" read -r info git_path; do
    set -- $info
    mode=$1
    # otype=$2  -- we don't care about the object type?
    # it should always be "blob"
    sha1=$3
    lowered="$(printf %s "$git_path" | tr '[:upper:]' '[:lower:]')"
    if [ "$prevlow" = "$lowered" ]; then
    if $in_coll; then
        echo "      and: $prevpath vs $git_path"
        show_or_do_rename $mode "$git_path" $sha1
    else
        echo "collision: $prevpath vs $git_path"
        show_or_do_rename $mode "$prevpath" $prevsha
        show_or_do_rename $mode "$git_path" $sha1
        in_coll=true
    fi
    else
    prevlow="$lowered"
    prevpath="$git_path"
    prevsha=$sha1
    in_coll=false
    fi
done

Here's a sample run. I made a "bad for windows" repo on a Linux box, then cloned it over to a Mac.

$ git clone ...
Initialized empty Git repository in /private/tmp/caseissues/.git/
remote: Counting objects: 16, done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 16 (delta 1), reused 0 (delta 0)
Receiving objects: 100% (16/16), done.
Resolving deltas: 100% (1/1), done.
$ cd caseissues
$ git status
# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   modified:   FooBar/readme.txt
#
no changes added to commit (use "git add" and/or "git commit -a")
$ git-casecoll.sh 
1 collision(s) found
collision: FooBar/readme.txt vs Foobar/readme.txt
will: mv FooBar/readme.txt FooBar/31892d33f4a57bff0acd064be4bb5a01143dc519
will: mv Foobar/readme.txt Foobar/591415e1e03bd429318f4d119b33cb76dc334772
$ git-casecoll.sh -r
1 collision(s) found
collision: FooBar/readme.txt vs Foobar/readme.txt
renamed: FooBar/readme.txt => FooBar/31892d33f4a57bff0acd064be4bb5a01143dc519
renamed: Foobar/readme.txt => Foobar/591415e1e03bd429318f4d119b33cb76dc334772
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   renamed:    FooBar/readme.txt -> FooBar/31892d33f4a57bff0acd064be4bb5a01143dc519
#   renamed:    Foobar/readme.txt -> Foobar/591415e1e03bd429318f4d119b33cb76dc334772
#

(at this point I pick out my own names for fixing these—note, I let it autocomplete, and had to try again, manually lower-case-ing the b in FooBar, because of case weirdness)

$ git mv FooBar/31892d33f4a57bff0acd064be4bb5a01143dc519 FooBar/readme_A.txt
$ git mv FooBar/591415e1e03bd429318f4d119b33cb76dc334772 FooBar/readme_B.txt
fatal: not under version control, source=FooBar/591415e1e03bd429318f4d119b33cb76dc334772, destination=FooBar/readme_B.txt
$ git mv Foobar/591415e1e03bd429318f4d119b33cb76dc334772 FooBar/readme_B.txt
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   renamed:    FooBar/readme.txt -> FooBar/readme_A.txt
#   renamed:    Foobar/readme.txt -> FooBar/readme_B.txt
#
$ git commit -m 'fix file name case issue'
[master 4ef3a55] fix file name case issue
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename FooBar/{readme.txt => readme_A.txt} (100%)
 rename Foobar/readme.txt => FooBar/readme_B.txt (100%)
torek
  • 448,244
  • 59
  • 642
  • 775