19

I'm trying to get case folding to be consistent between three languages (C++, Python and Golang) because I need to be able to check if a string matches the one saved no matter the language.

An example problematic word is the German word "grüßen" which in uppercase is "GRÜSSEN" (Note the 'ß' becomes two characters as 'SS').

Is there some way to do this that I'm missing, or does this bug at the end of unicode's documentation apply to all usages of text conversion in golang? If so, what are my options for case folding other than writing it in cgo?

Shawn Blakesley
  • 1,743
  • 1
  • 17
  • 33
  • Given golang implements the capitalisation function as `func to(_case int, r rune, caseRange []CaseRange) rune {` is it even possible to return multiple rules at all. – zerkms Mar 28 '17 at 03:23
  • Yeah, that's what I'm trying to get at. There are languages where one "rune" can become two through case folding / capitalization, so there should be a way to handle such a thing in golang. – Shawn Blakesley Mar 28 '17 at 03:31
  • 1
    If you end up creating an issue could you please post a link here (since I don't think there is something there to properly convert it) – zerkms Mar 28 '17 at 03:32
  • Will do. I just didn't want to create an issue until I had done more research / reached out for help. – Shawn Blakesley Mar 28 '17 at 03:33
  • 1
    Interesting, and kinda relevant: http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt full case folding are the tricky unicode codepoints (and won't work in Go) – zerkms Mar 28 '17 at 03:40
  • 4
    Not in the core: please look at what [golang.org/x/text](https://godoc.org/golang.org/x/text) can do for you. – kostix Mar 28 '17 at 04:17
  • Awesome! Thanks kostix. If you turn that into an answer I will accept it. Basically using `import "golang.org/x/text/cases"` I can do `c := cases.Fold()` then `c.String("grüßen")` and it works. – Shawn Blakesley Mar 28 '17 at 04:32

1 Answers1

10

Advanced (Unicode-enabled) text processing is not part of the Go stdlib,¹ and exists in the form of a host of ("blessed") third-party packages under the golang.org/x/text/ umbrella.

As Shawn figured out by himself, one can do

import (
  "golang.org/x/text/cases"
)

c := cases.Fold()
c.String("grüßen")

to get "grüssen" back.


¹ That's because whatever is shipped in the stdlib is subject to the Go 1 compatibility promise, and at the time Go 1 was shipped certain functionality wasn't available or was incomplete or its APIs were in flux etc, so such bits were kept out of the core to let them mature.

Shawn Blakesley
  • 1,743
  • 1
  • 17
  • 33
kostix
  • 51,517
  • 14
  • 93
  • 176