3

I want to split a string on a regular expresion, but preserve the matches.

I have tried splitting the string on a regex, but it throws away the matches. I have also tried using this, but I am not very good at translating code from language to language, let alone C#.

re := regexp.MustCompile(`\d`)
array := re.Split("ab1cd2ef3", -1)

I need the value of array to be ["ab", "1", "cd", "2", "ef", "3"], but the value of array is ["ab", "cd", "ef"]. No errors.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
Isacc Barker
  • 507
  • 4
  • 15

4 Answers4

1

The kind of regex support in the link you have pointed out is NOT available in Go regex package. You can read the related discussion.

What you want to achieve (as per the sample given) can be done using regex to match digits or non-digits.

package main

import (
    "fmt"
    "regexp"
)

func main() {
    str := "ab1cd2ef3"
    r := regexp.MustCompile(`(\d|[^\d]+)`)
    fmt.Println(r.FindAllStringSubmatch(str, -1))
}

Playground: https://play.golang.org/p/L-ElvkDky53

Output:

[[ab ab] [1 1] [cd cd] [2 2] [ef ef] [3 3]]
peterSO
  • 158,998
  • 31
  • 281
  • 276
sahaj
  • 822
  • 5
  • 17
  • Can I do the same for multiple separators? It seems like I cant. `( \(\)|[^ \(\)]+)` – Isacc Barker Jul 20 '19 at 23:28
  • It is just regex to match tokens. So as long as you can represent token separately using regex, you can have multiple of those separated by `|`. If you can provide some sample string, it will be easier to understand. – sahaj Jul 21 '19 at 09:19
0

I don't think this is possible with the current regexp package, but the Split could be easily extended to such behavior.

This should work for your case:

func Split(re *regexp.Regexp, s string, n int) []string {
    if n == 0 {
        return nil
    }

    matches := re.FindAllStringIndex(s, n)
    strings := make([]string, 0, len(matches))

    beg := 0
    end := 0
    for _, match := range matches {
        if n > 0 && len(strings) >= n-1 {
            break
        }

        end = match[0]
        if match[1] != 0 {
            strings = append(strings, s[beg:end])
        }
        beg = match[1]
        // This also appends the current match
        strings = append(strings, s[match[0]:match[1]])
    }

    if end != len(s) {
        strings = append(strings, s[beg:])
    }

    return strings
}

0

Dumb solutions. Add separator in the string and split with separator.

package main

import (
    "fmt"
    "regexp"
    "strings"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    input := "ab1cd2ef3"
    sep := "|"

    indexes := re.FindAllStringIndex(input, -1)
    fmt.Println(indexes)

    move := 0
    for _, v := range indexes {
        p1 := v[0] + move
        p2 := v[1] + move
        input = input[:p1] + sep + input[p1:p2] + sep + input[p2:]
        move += 2
    }

    result := strings.Split(input, sep)

    fmt.Println(result)
}
0

You can use a bufio.Scanner:

package main

import (
   "bufio"
   "strings"
)

func digit(data []byte, eof bool) (int, []byte, error) {
   for i, b := range data {
      if '0' <= b && b <= '9' {
         if i > 0 {
            return i, data[:i], nil
         }
         return 1, data[:1], nil
      }
   }
   return 0, nil, nil
}

func main() {
   s := bufio.NewScanner(strings.NewReader("ab1cd2ef3"))
   s.Split(digit)
   for s.Scan() {
      println(s.Text())
   }
}

https://golang.org/pkg/bufio#Scanner.Split

Zombo
  • 1
  • 62
  • 391
  • 407