27

Edit: git does not mess with character encoding. This is still here to share knowlege and avoid others making the same mistake.


The context: My enterprise uses an svn repository. I'm using git-svn as a client to interact with this repository. All text files in the project are (and must be) encoded with windows default encoding (cp-....). I use git-extensions, and sometimes the command line to pilot git.

What I did: During the last 3 days, I was working on a new feature, and I did a number of local commits. Finally i squashed all these commits into a single one using an interactive rebase, then i used git svn dcommit to push everything on the svn repository in a single commit.

What happened then: A collegue told me that all accents were messed up in the files that I modified, and in the new files after my commit. I had already commited text files with accents in the same repository with my installation of git + svn before, and it's the first time I face this issue.

My investigation:I did the following things to investigate: opened the files with notepad++, and tried the most current encodings (including windows default and UTF-8) to view them: none of them could display accents properly, and different accents are always rendered by the same sequence of strange glyphs.

The temporary workaround:I quickly created a revert commit with git extension and "dcommited" it.

The question:My enterprise svn repository is OK, but now i have the two following problems to solve:

  1. Understand what happened with the characters with accents
  2. Retrieve my work from the SVN history and commit it in a proper way (if possible without reviewing manually all the characters with accents)

Can anybody provide some clues (i'm rather new to git) ?

Samuel Rossille
  • 18,940
  • 18
  • 62
  • 90
  • 1
    Do you mean that your text files contents was changed, not paths? (I ask because as I know git-svn works with files like with byte array). What version of git-svn do you use? – Dmitry Pavlenko May 16 '12 at 22:50
  • Yeah, it's the content of the files which was changed during the operation, not the paths. I'm updating as soon as a new version comes, but i'm not at work right now. I'll tell you the exact version numbers of git and git extensions as soon as I can – Samuel Rossille May 16 '12 at 23:02
  • When git-svn dcommits changes to the repository does the following: – Dmitry Pavlenko May 16 '12 at 23:59
  • I can't wait to read what's after "the following:" in your answer ;=) – Samuel Rossille May 17 '12 at 00:09
  • 1
    Sorry, enter here just posts the comment, I didn't know. Either interactive rebase has spoiled the files or git-svn. You may check by creating a temporary branch (git co ; git co -b tmpbranch) for the commit which was the latest before you performed the interactive rebase (you may find old commits ids using "git reflog" command), and redo that interactive rebase under the same cicurstamces. After that have a look if your files are ok. Please, let me know if it is git-svn or rebase problem. – Dmitry Pavlenko May 17 '12 at 00:18
  • ok thanks for the 'reflog' tip I did not know. Do you actually mean that my old commits were not destroyed by the squash + dcommit process ? Can't wait to check this but i'm at home right now – Samuel Rossille May 17 '12 at 00:23
  • 3
    Git doesn't destroy objects in operations, it just inserts new and updates references. It destroys them only in garbage collector call (though often it is called implicitly, by default it doesn't prune all unreachable objects). Git keeps all objects reachable from references and reflog. But even unreachable objects (by default) are not collected for about 30 days. Only if you called "git prune" or "git gc --prune" or sth like it explicitly. – Dmitry Pavlenko May 17 '12 at 00:32
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/11383/discussion-between-samuel-rossille-and-dmitry-pavlenko) – Samuel Rossille May 17 '12 at 13:51

1 Answers1

32

And now let's reveal the painful truth (painful for my ego, not for git users): I did mess with the accents, not git.

I could have just removed the question which let's wrongly think that git can mess up with accents, but considering the number of upvotes, i think than a lot of people do the same mistake that i did, so I have chosen to answer my own question to establish the truth, and maybe help people in the same case:

  1. Git does not touch to characters other than line breaks.
  2. I broke the accents before commiting, and i did not noticed it because i did not pay enough attention. To do so, i edited some of the files with eclipse. Eclipse did not recognize the encoding and the accents were all replace by a weird byte sequence on save. That's all.

Thanks again to Dmitry Pavlenko for giving me indications on how to investigate this problem.

+1 to "git reflog"

Happy accent fixing ;=)

Community
  • 1
  • 1
Samuel Rossille
  • 18,940
  • 18
  • 62
  • 90
  • Unfortunatly, Eclipse uses different defaults for enconding on Linux and Windows. That alone caused me more trouble than anything else. – Rodrigo Coacci Oct 20 '14 at 17:37
  • I came searching for this because of Eclipse as well. I think my issue was only due to line endings changing, but I wanted to make sure that's all it was. Thank you for your honesty and detailed information :) – kwill Apr 10 '18 at 18:41