-3

I have special filenames with escape \ characters stored in Git repository on Debian 10 Linux.

Problem: it is not possible to git checkout files on Windows, which have incompatible characters in the filename.

Example:

git log --all --name-only -m --pretty= '*\\*'
"systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
"systemd/system/multi-user.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
"systemd/system/snap-git\\x2dfilter\\x2drepo-7.mount"

I get following Git errors at Windows checkout:

C:\Git\bin\git.exe reset --hard "5ef1cac3a03304c35b455edf32bd1bb78060c5b9" --
error: invalid path 'systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount'
fatal: Could not reset index file to revision '5ef1cac3a03304c35b455edf32bd1bb78060c5b9'.
Done

Problem reproducing steps:

# Clone repository, to be executed on a safe repo:
git clone --no-local /source/repo/path/ /target/path/to/repo/clone/
# Cloning into '/target/path/to/repo/clone'...
# remote: Enumerating objects: 9534, done.
# remote: Counting objects: 100% (9534/9534), done.
# remote: Compressing objects: 100% (4776/4776), done.
# remote: Total 9534 (delta 4215), reused 8043 (delta 3136), pack-reused 0
# Receiving objects: 100% (9534/9534), 7.41 MiB | 16.78 MiB/s, done.
# Resolving deltas: 100% (4215/4215), done.

cd /target/path/to/repo/clone/

# List the files with escape \ from repo history into a list file:
git log --all --name-only -m --pretty= '*\\*' | sort -u >/opt/git_repo_files_w_escape.txt

# Remove the files with escape \ from repo history:
git filter-repo --invert-paths --paths-from-file /opt/git_repo_files_w_escape.txt
Parsed 592 commits
New history written in 0.25 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 71128f3 .gitignore: ADD snap-git to be ignored
Enumerating objects: 9354, done.
Counting objects: 100% (9354/9354), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3694/3694), done.
Writing objects: 100% (9354/9354), done.
Total 9354 (delta 4085), reused 9354 (delta 4085), pack-reused 0
Completely finished after 0.55 seconds.


# List files with escape \ to check result:
git log --format="reference" --name-status --diff-filter=A '*\\*'
# "systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
# "systemd/system/multi-user.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
# "systemd/system/snap-git\\x2dfilter\\x2drepo-7.mount"

#  Unfortunately it seems filter-repo was executed, but log still lists filenames with escape \ :-( 

Question:

1) How to remove all files from Git repo history with path having at least one escape \ character in filename?

(reason: it is not possible to checkout those files on Windows, which have incompatible characters in the filename)

UPDATE1:

Tried to replace \\x2d string to - in input file list as suggested, but git history remove was still unsuccessful:

# List the files with escape \ from repo history into a list file:
git log --all --name-only -m --pretty= '*\\*' | sort -u >/opt/git_repo_files_w_escape.txt

# Replace \\x2d string to - in git_repo_files_w_escape.txt:
sed -i 's/\\\\x2d/-/g' /opt/git_repo_files_w_escape.txt

# Remove the listed files from repo history:
git filter-repo --invert-paths --paths-from-file /opt/git_repo_files_w_escape.txt
Parsed 592 commits
New history written in 0.25 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 71128f3 .gitignore: ADD snap-git to be ignored
Enumerating objects: 9354, done.
Counting objects: 100% (9354/9354), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3694/3694), done.
Writing objects: 100% (9354/9354), done.
Total 9354 (delta 4085), reused 9354 (delta 4085), pack-reused 0
Completely finished after 0.55 seconds.


# List files with escape \ to check result:
git log --format="reference" --name-status --diff-filter=A '*\\*'
# "systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
# "systemd/system/multi-user.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
# "systemd/system/snap-git\\x2dfilter\\x2drepo-7.mount"

#  Unfortunately log still lists filenames with \\x2d :-(

UPDATE2:

Tried to replace \\x2d in git_repo_files_w_escape.txt to \\\\x2d or \x2d but none of them resulted to remove the files having \\x2d in filename from Git history.

UPDATE3:

I'm looking for a working solution based on git filter-repo.

Any more idea?

klor
  • 1,237
  • 4
  • 12
  • 37
  • Colon is not backslash so what are we even talking about here? – matt Jan 17 '23 at 17:33
  • 2
    And otherwise isn't this the same as your https://stackoverflow.com/questions/75112545/how-to-remove-all-files-from-git-repo-history-with-path-having-colon-in-filena ? – matt Jan 17 '23 at 17:34
  • Colon was a typo. Fixed in OP. – klor Jan 17 '23 at 17:35
  • 2
    Also backslash of itself is not escape character. It's just a backslash. – matt Jan 17 '23 at 17:36
  • Well in Linux bash you have to escape special chars in path with backslash. So it is escape char. – klor Jan 17 '23 at 17:38
  • 2
    But that doesn't make the escape backslash a character in the resulting path. It's just a way of talking to bash. – matt Jan 17 '23 at 17:43
  • Not the same as [#75112545](https://stackoverflow.com/questions/75112545/how-to-remove-all-files-from-git-repo-history-with-path-having-colon-in-filena), because the working solution used in #75112545 for colon, does not work for backslash. So it requires opening a new question. – klor Jan 17 '23 at 17:43
  • 1
    Only if one doesn't understand string escaping, perhaps. Otherwise they are identical. – matt Jan 17 '23 at 17:44
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/251219/discussion-between-klor-and-matt). – klor Jan 17 '23 at 17:49
  • Basically, you are just trying to figure out how to escape a [unicode minus sign](https://www.fileformat.info/info/unicode/char/2d/index.htm) in a path in Windows. – JDB Jan 17 '23 at 17:52
  • @JDB No. I try to remove files with invalid characters in filename from GIT repo history. – klor Jan 17 '23 at 17:54
  • @klor - I think that `\\x2d` is an escape sequence which is translated to `-`. I bet you could make your code work if you changed your paths to `systemd/system/default.target.wants/snap-git-filter-repo-7.mount`, etc. – JDB Jan 17 '23 at 18:01
  • @JDB Good idea. Almost sure this is the problem source. I will try this immediately. – klor Jan 17 '23 at 18:03
  • @JDB Unfortunately not. Edited file with git_repo_files_w_escape.txt, replaced \\x2d with - and executed `git filter-repo`. After listing git log, the files are still there. – klor Jan 17 '23 at 18:13
  • Did you try 4 slashes, for testings? `\\\\x2d`. – VonC Jan 17 '23 at 19:29
  • @VonC Yes. See the updated OP. `sed -i 's/\\\\x2d/-/g' /opt/git_repo_files_w_escape.txt` – klor Jan 17 '23 at 19:40
  • @klor Sorry, I meant keeping the `\x2d` in the file (so no sed, no replacement by '-'), but using in that file `\\x2d` (2 slash) or `\\\\x2d` (4 slash) for testing. Again, no sed, edit the file directly. – VonC Jan 17 '23 at 19:44
  • @VonC Both were unsuccessful, `\\x2drepo-7.mount` and `\\\\x2drepo-7.mount`, too :-( – klor Jan 17 '23 at 20:16
  • Does it have to be a `git-filter-repo` solution, or is the important aspect that you need to change the whole history regardless of the tool used? – j6t Jan 20 '23 at 10:35
  • @j6t Well, history rewriting is a risky task. `git filter-branch` became obsoleted, because `git filter-repo` does the same task safer and faster. This is the reason I stick to `git filter-repo`. – klor Jan 20 '23 at 19:12

3 Answers3

4

You fed bad input into filter-repo, based on a common but incorrect assumption about how git log works.

Look at your own output:

$ git log --format="reference" --name-status --diff-filter=A '*\\*'
"systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
"systemd/system/multi-user.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
"systemd/system/snap-git\\x2dfilter\\x2drepo-7.mount"

Let's look at the first line as an example. If you were to store that in a file, which you pass to --paths-from-file, then git-filter-repo is going to be looking for a file named "systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount" to remove. You have no such file in your repository. Instead you have one named systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount. (Note that I have removed both " characters and two of the \ characters.)

The problem here is that you assumed git log would list filenames as-is, which it won't do whenever there are special characters. You can often get around this by setting core.quotepath=false (this particularly helps when you have non-ascii characters), but even that is ignored when you have backslashes.

Here's something that might work better for you for generating the list of filenames to exclude:

git log -z --all --name-only -m --pretty= '*\\*' | tr '\0' '\n' | sort -u >/opt/git_repo_files_w_escape.txt

but it assumes you do not have filenames with newline characters. (If you do have files with newline characters, though, then --paths-from-file won't work for you.)

Even simpler would be bypassing creating a list of files with bad names and just programatically removing them by pattern:

git filter-repo --filename-callback 'return None if b'\\' in filename else filename'
  • @newren Thank you very much for pointing me to the right solution! Your solution works perfectly, it removed all files having backlash in filename. You are right, it is not a bug, just the git log result was not in the right format for input into `git filter-repo`. – klor Jan 22 '23 at 07:20
0

fwiw, this worked on a linux system, this allowed me to rewrite the HEAD commit without having the files checked out on disk:

git ls-files | grep -a -e '\\' | while read f; do
    f=$(echo $f | sed -e 's|"||g')
    new=$(echo "$f" | sed -e 's|\\\\x2d|-|g')
    git show "@:$f" > $new
    git rm --cached "$f"
    git add "$new"
done

git status
git commit --amend

The same commands should work on git-bash for windows.

LeGEC
  • 46,477
  • 5
  • 57
  • 104
  • Thank you for your answer. But this answer rewrites only the HEAD, not the whole repo history. I'm looking for a working solution based on git filter-repo. – klor Jan 20 '23 at 10:05
  • 1
    @klor: if you take the command as is, yes. It also provides a base for writing a set of commands that renames files containing '\\' in their names, which could give you a way to turn it into a script which you can invoke with `git filter-branch` for example. Unfortunately I don't have enough time to research a complete solution to your issue. – LeGEC Jan 20 '23 at 11:06
  • 1
    also check https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#_filtering_based_on_many_paths – LeGEC Jan 20 '23 at 11:15
  • perhaps fiddling with something like `regex:(.*)(\\\\x2d)(.*)=>\1-\3` (try it on a smaller repo to check the effects) – LeGEC Jan 20 '23 at 11:16
0

Assuming you have many files that you want to fix scattered in the hierarchy, a solution with git filter-repo looks tedious. You can instead use a combination of git fast-export and git fast-import to modify file names in the whole history.

git fast-export --no-data --all > exported

Now delete the file entries containing a backslash:

grep -v '^[DM] .*\\' exported > fixed

Instead of removing the files, you can also modify the file names. For example, to replace the backslash by a dash -, you could try this:

sed -e '/^[DM] /s,\\,-,g' < exported > fixed

You may now investigate the difference between the two files to ensure that no commit messages were modified:

diff -u exported fixed | less

Now attempt to import the modified history:

git fast-import < fixed

This will stop with an error that tells you that the branches will not be modified because the old branch heads are not subsets of the new heads. If there are no other errors, you can now force the modification:

git fast-import --force < fixed
j6t
  • 9,150
  • 1
  • 15
  • 35
  • Nah, not tedious at all with filter-repo. But, more importantly, suggesting programmatic edits of fast-export should be accompanied with big warnings. The `--no-data` avoids the worst problems, but you should really emphasize how important that option is to avoid folks modifying your solution for other problems where they drop that option and then corrupt their repo. Also, even with --no-data, there's a risk that you will be removing lines from commit messages and corrupting the stream. filter-repo was written in part because editing fast-export streams programatically can be risky. – Elijah Newren Jan 22 '23 at 07:58