3

I am working on extracting mutliple matches between two strings.

In the example below, I am trying to regex out an A B C substring out of my string.

Here is my code:

package main
    
import (
    "fmt"
    "regexp"
)
    
func main() {
    str:= "Movies: A B C Food: 1 2 3"
    re := regexp.MustCompile(`[Movies:][^Food:]*`)
    match := re.FindAllString(str, -1)
    fmt.Println(match)
}

I am clearly doing something wrong in my regex. I am trying to get the A B C string between Movies: and Food:.

What is the proper regex to get all strings between two strings?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Mystery Man
  • 535
  • 2
  • 7
  • 20
  • You're matching one character in the first set, and zero or one characters *not* in the second set. I would review basic regex syntax and you should solve this easily. – user3483203 Feb 02 '21 at 21:27
  • I did look at the docs, obviously it wasn't that helpful, which is why I am here. – Mystery Man Feb 02 '21 at 21:35
  • Perhaps this page can be helpful https://stackoverflow.com/questions/5642315/regular-expression-to-get-a-string-between-two-strings-in-javascript – The fourth bird Feb 02 '21 at 21:39
  • @Thefourthbird thanks but that is not in Go language – Mystery Man Feb 02 '21 at 21:40
  • Your problem isn't how you are using Go, your code will work fine once you fix your regex. For such a basic match it won't differ much between engines – user3483203 Feb 02 '21 at 21:42
  • 2
    You could use for example https://regex101.com/ and set the language at the left to Golang. Lookarounds are not supported, so you could use a capture group approach which is shown on that page. – The fourth bird Feb 02 '21 at 21:45
  • @Thefourthbird thank you but all of the patterns only show us how to get all characters that are not a number, or all characters that do not contain a certain letter, but is there a damn pattern where it isolates a word and parses the text between that word and another word. Why is this so hard? – Mystery Man Feb 02 '21 at 21:54
  • Consider this broad match using a capture group `\bMovies: (.+?) Food:` which is a bit like `cow (.*?) milk` using a group as on the other page. See https://regex101.com/r/erh9WZ/1 Also you you might read about [groups](https://www.regular-expressions.info/refcapture.html), [quantifiers](https://www.regular-expressions.info/refrepeat.html) and [character classes](https://www.regular-expressions.info/charclass.html) that you used in your pattern. – The fourth bird Feb 02 '21 at 21:59
  • thank you but that returns movies and food still not only the characters in between. – Mystery Man Feb 02 '21 at 22:06
  • It is easy - https://play.golang.org/p/8DhhpY_v5XZ – Wiktor Stribiżew Feb 02 '21 at 22:26

1 Answers1

6

In Go, since its RE2-based regexp does not support lookarounds, you need to use capturing mechanism with regexp.FindAllStringSubmatch function:

left := "LEFT_DELIMITER_TEXT_HERE"
right := "RIGHT_DELIMITER_TEXT_HERE"
rx := regexp.MustCompile(`(?s)` + regexp.QuoteMeta(left) + `(.*?)` + regexp.QuoteMeta(right))
matches := rx.FindAllStringSubmatch(str, -1)

Note the use of regexp.QuoteMeta that automatically escapes all special regex metacharacters in the left- and right-hand delimiters.

The (?s) makes . match across lines and (.*?) captures all between ABC and XYZ into Group 1.

So, here you can use

package main

import (
    "fmt"
    "regexp"
)

func main() {
    str:= "Movies: A B C Food: 1 2 3"
    r := regexp.MustCompile(`Movies:\s*(.*?)\s*Food`)
    matches := r.FindAllStringSubmatch(str, -1)
        for _, v := range matches {
            fmt.Println(v[1])
        }   
}

See the Go demo. Output: A B C.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I have also written a general [article about extracting strings between two strings with regex](https://www.buymeacoffee.com/wstribizew/extracting-text-two-strings-regular-expressions), too, feel free to read if you have a problem approaching your current similar problem. – Wiktor Stribiżew Feb 06 '21 at 22:08