7

I'm trying to remove non-printable characters from a string in Golang.

https://play.golang.org/p/Touihf5-hGH

invisibleChars := "Douglas​"
fmt.Println(invisibleChars)
fmt.Println(len(invisibleChars))

normal := "Douglas"
fmt.Println(normal)
fmt.Println(len(normal))

Output:

Douglas​
10
Douglas
7

The first string has an invisible char at the end.

I've tried to replace non-ASCII characters, but it removes accents too.

How can I remove non-printable characters only?

icza
  • 389,944
  • 63
  • 907
  • 827
fonini
  • 3,243
  • 5
  • 30
  • 53
  • Refer to this [what-is-the-range-of-unicode-printable-characters](https://stackoverflow.com/questions/3770117/what-is-the-range-of-unicode-printable-characters). And answer from icza is great. – mbyd916 Nov 22 '19 at 12:33

3 Answers3

24

Foreword: I released this utility in my github.com/icza/gox library, see stringsx.Clean().


You could remove runes where unicode.IsGraphic() or unicode.IsPrint() reports false. To remove certain runes from a string, you may use strings.Map().

For example:

invisibleChars := "Douglas​"
fmt.Printf("%q\n", invisibleChars)
fmt.Println(len(invisibleChars))

clean := strings.Map(func(r rune) rune {
    if unicode.IsGraphic(r) {
        return r
    }
    return -1
}, invisibleChars)

fmt.Printf("%q\n", clean)
fmt.Println(len(clean))

clean = strings.Map(func(r rune) rune {
    if unicode.IsPrint(r) {
        return r
    }
    return -1
}, invisibleChars)

fmt.Printf("%q\n", clean)
fmt.Println(len(clean))

This outputs (try it on the Go Playground):

"Douglas\u200b"
10
"Douglas"
7
"Douglas"
7
icza
  • 389,944
  • 63
  • 907
  • 827
12
invisibleChars = strings.TrimFunc(invisibleChars, func(r rune) bool {
        return !unicode.IsGraphic(r)
    })

Go Playground: https://play.golang.org/p/39yWgnnRPXr

2

Just F.Y.I.,

I often use strings.TrimFunc, but I have found that strings.Map() detects invisible chars better than strings.TrimFunc.

strings.TrimFunc can not detect if the input chars are "Douglas\u200b" + "bar". The following example fails if followed by "bar". The result becomes 13 rather than 10.

func ExampleTrimFunc() {
    invisibleChars := "Douglas\u200b" + "bar"
    invisibleChars = strings.TrimFunc(invisibleChars, func(r rune) bool {
        return !unicode.IsGraphic(r)
    })

    fmt.Println(invisibleChars)
    fmt.Println(len(invisibleChars))

    normal := "Douglasbar"
    fmt.Println(normal)
    fmt.Println(len(normal))

    // Output:
    // Douglasbar
    // 10
    // Douglasbar
    // 10
}

However, using strings.Map() as follows is successful.

 func ExampleTrimFunc() {
    invisibleChars := "Douglas\u200b" + "bar"
-   invisibleChars = strings.TrimFunc(invisibleChars, func(r rune) bool {
-       return !unicode.IsGraphic(r)
-   })
+   invisibleChars = strings.Map(func(r rune) rune {
+       if unicode.IsGraphic(r) {
+           return r
+       }
+       return -1
+   }, invisibleChars)
 
    fmt.Println(invisibleChars)
    fmt.Println(len(invisibleChars))
 
    normal := "Douglasbar"
    fmt.Println(normal)
    fmt.Println(len(normal))
 
    // Output:
    // Douglasbar
    // 10
    // Douglasbar
    // 10
 }
KEINOS
  • 71
  • 3