-2

I'm trying to check that a string is comma-separated "correctly", meaning that

validString := "a, b, c, d, e"

is a valid comma-separated string, as it contains single commas.

Here's an invalid string, as it contains multiple commas and a semicolon:

invalidString1 := "a,, b, c,,,,d, e,,; f, g"

Here's another invalid string, as it contains , , and a place whereby there is no comma between a and b:

invalidString2 := "a  b,  , c, d"

My first idea was to using the "regexp" package to check that the string is valid using regex patterns discussed elsewhere: Regex for Comma delimited list

package main
import (
    "fmt"
    "regexp"
)
func main() {
    r = regexp.MustCompile("(\d+)(,\s*\d+)*")
}

However, I don't understand how we would use this to "validate" strings...that is, either classify the string as valid or invalid based on these regex patterns.

EB2127
  • 1,788
  • 3
  • 22
  • 43

2 Answers2

2

Once you have the regular expression compiled you can use Match (or MatchString) to check if there is a match e.g. (using a slightly modified regular expression so your examples work):

package main

import (
    "fmt"
    "regexp"
)

func main() {
    r := regexp.MustCompile(`^(\w+)(,\s*\w+)*$`)
    fmt.Println(r.MatchString("a, b, c, d, e"))
    fmt.Println(r.MatchString("a,, b, c,,,,d, e,,; f, g"))
    fmt.Println(r.MatchString("a  b,  , c, d"))
}

Try it in the go playground.

There are plenty of other ways of checking for a valid comma separated list; which is best depends upon your use case. If you are loading the data into a slice (or other data structure) then its often simplest to just check the values as you process them into the structure.

Brits
  • 14,829
  • 2
  • 18
  • 31
  • Strings containing `-` or `_` should be ok, e.g. this is valid: `cat, dog-go, dog_fish` – EB2127 Sep 06 '20 at 17:21
  • If you want to match other strings then change the regular expression (e.g. something like `^(\w+|-|_)(,\s*(\w|-|_)+)*$`). You will need to take into account where the characters can appear too (with the current expression "a, b, c" is valid but "a, b , c" is not). – Brits Sep 06 '20 at 20:18
2

In case that performance are a key factor, you can simply remove every whitespace and proceed with the following algorithm:

func validate(str string) bool {
    for i, c := range str {
        if i%2 != 0 {
            if c != ',' {
                return false
            }
        }
    }
    return true
}

Here the benchmark

BenchmarkValidate-8         362536340            3.27 ns/op        0 B/op          0 allocs/op
BenchmarkValidateRegex-8    13636486            87.4 ns/op         0 B/op          0 allocs/op

Note:
The procedure work only if the letter have no space, cause rely on the fact that we need to validate a sequence of "CHARACTER-SYMBOL-CHARACTER-SYMBOL"


Code for the benchmark


func BenchmarkValidate(b *testing.B) {
    invalidString1 := "a,, b, c,,,,d, e,,; f, g"
    invalidString1 = strings.ReplaceAll(invalidString1, " ", "")
    for x := 0; x < b.N; x++ {
        validate(invalidString1)
    }
}

func BenchmarkValidateRegex(b *testing.B) {
    r := regexp.MustCompile(`^(\w+)(,\s*\w+)*$`)
    invalidString1 := "a,, b, c,,,,d, e,,; f, g"
    for x := 0; x < b.N; x++ {
        r.MatchString(invalidString1)
    }
}
alessiosavi
  • 2,753
  • 2
  • 19
  • 38
  • No, but with the input of the benchmark test shared we can understand that the simple `validate` method is quite ~30 time faster than execute the regex – alessiosavi Sep 06 '20 at 15:17