2

I have a simple (but hopefully fast solvable) problem. I created a git archive under windows using the msys/tortoiseGit tools. All right. Now I copied the repo to a USB stick and walked it over to a linux machine.

Unfortunately there are files in the repo that contain German Umlauts etc (äöüß). Under pure windows there is no problem and I know that under pure linux there is also no problem with that.

When I now clone the repo locally the umlauts are replaced by other characters that are not displayable on my machine (results in a ?). At the moment I am not able to reach the windows machine to modify anything. Also it should work soon as I will no more be able to access the windows machine soon.

Therefore:

  • Can I (somehow) modify the archive to ensure correct character set?
  • Can I avoid this situation when I create new archives?
  • Can I (only using linux) clone the repo suh that it works transparently?
  • [edit] How to reweite the repo such that (at least) the file names get into the right charset?
Christian Wolf
  • 1,187
  • 1
  • 12
  • 33

2 Answers2

2
  • Aside from rewriting the archive, not that I know of.

  • Make sure your Windows editors use UTF-8 instead of a local codepage.
    See below.

  • Your Linux might ship with non-UTF-8 locales. To a certain extent,

    LANG=de_DE.iso88591@euro
    

    will request that your programs read and write in the same encoding. However, this is not a complete fix; for example Gtk+ assumes filenames are in UTF-8 regardless of the content encoding.

  • The usual way to rewrite a Git repo is using git-filter-branch. Here is an example I made just now that should re-encode filenames, file contents, and commit messages from ISO-8859-1 to UTF-8.

    CONVERT='iconv -fiso8859-1 -tutf-8'
    git filter-branch \
        --index-filter '
            git ls-files -z --stage |
            while read -d "" mode ref stage name; do
                [[ "$stage" = 0 || "$stage" = 1 ]] &&
                printf "0 0000000000000000000000000000000000000000\t%s\0" "$name"
                newname="$(echo "$name" | '"$CONVERT'")"
                newref="$(
                    git cat-file blob "$ref" |
                    '"$CONVERT"' |
                    git hash-object -w --stdin)"
                printf "%s %s %s\t%s\0" "$mode" "$newref" "$stage" "$newname"
            done |
            git update-index -z --index-info' \
        --msg-filter "$CONVERT" \
        $(git for-each-ref --format='%(refname)' refs/heads refs/tags)
    

    Be careful: I haven't tested this in the presence of merges or binary files, and it's easy to destroy a lot of history with git-filter-branch. In case something goes wrong, git keeps backups of all positive refs (rewritten or not) in the refs/original namespace.


Just found an amazing answer by VonC: On Windows, use msysgit≥1.7.10, and to fix up an existing repository, there's recodetree binary (filenames only, unlike the above).

Community
  • 1
  • 1
ephemient
  • 198,619
  • 38
  • 280
  • 391
0

Git stores files as binary blobs, so you cannot fix it without rewriting the whole history. It is not about modifying the repo, but modifying the files.

mateusza
  • 5,341
  • 2
  • 24
  • 20
  • So, what...? I cannot use the archive and I am dammed to use windows in future with all that repos? That looks quite bad! About the rewrite: How to do that? Passible? – Christian Wolf Jun 03 '12 at 13:21