3

I have a java project with files with umlaut charachters in their names. How do I have to set-up a git repository in order it can be used with the EGit plugin of Eclipse.

I already tried with a simple

 git init
 git add *

But this turned out to be non working as you can see in this post.

I think somehow I have to tell git that it has to treat the file names as utf8.

I'm on Max OSX 10.7, but I've seen the same problem on Windows 7 Pro.

Any ideas?

Community
  • 1
  • 1
BetaRide
  • 16,207
  • 29
  • 99
  • 177
  • possible duplicate of [Eclipse EGit and git on command line show different status](http://stackoverflow.com/questions/10444398/eclipse-egit-and-git-on-command-line-show-different-status) – Wooble May 04 '12 at 11:17
  • No, it's a follow up. The question contains a link. – BetaRide May 04 '12 at 12:10

3 Answers3

2

MacOS encodes characters (in filenames) in decomposed form (NFD), while everyone else uses the composed form (NFC). When adding filenames using Git the decomposed form enters the repository, since Git (except on Windows) does not recode the filenames it reads from disk.

EGit assumes the composed form is used. If you use non-ascii names on Mac, only use EGit or JGit with the repo, unless you are aware of the issues.

Git for Windows as of 1.7.10 uses UTF-8 NFC.

Composed for means that a character like "Ä" is encoded as one unicode character, while the decomposed form means it is encoded as "A" + "Add two dots above previous character".

$ touch Ä
$ echo Ä|od -tx1a
0000000    c3  84  0a                                                    
           ?  84  nl                                                    
0000003
$ ls|od -tx1a
0000000    41  cc  88  0a                                                
           A   ?  88  nl                                                
0000004
$ 

Update: Since 1.7.12 native git on OS X has an option to precompose non-ASCII characters in a way that is compatible with both EGit and Git on Windows. You need to configure core.precomposeunicode to true.

robinr
  • 4,376
  • 2
  • 20
  • 18
0

AFAIK JGit and EGit always use UTF-8 path encoding, but native git doesn't guarantee that [1]

[1] search for "encoding" in http://schacon.github.com/git/git-commit.html also see http://git.661346.n2.nabble.com/Path-character-encodings-td7346463.html

Matthias Sohn
  • 264
  • 2
  • 1
0

Note: even with core.precomposeunicode set to true, you can still have issues, as illustrated in commit 750b2e4 by Jeff King (peff)

t3910: show failure of core.precomposeunicode with decomposed filenames

If you have existing decomposed filenames in your git repository (e.g., that were created with older versions of git that did not precompose unicode), a modern git with core.precomposeunicode set does not handle them well.

The problem is that we normalize the paths coming from the disk into their precomposed form, and then compare them against the literal bytes in the index.
This makes things better if you have the precomposed form in the index.
It makes things worse if you actually have the decomposed form in the index.

As a result, paths with decomposed filenames may have their precomposed variants listed as untracked files (even though the precomposed variants do not exist on-disk at all).

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250