4

I have a regular expression that I'v verified in some online regex parsers

https://regexr.com/3h5h8

^(.*\.(?!(htm|html|class|js)$))?[^.]

How ever implementing this in golang doesn't match the same way the online regex parser does

package main

import (
    "fmt"
    "regexp"
    "strconv"
)

type ParsedConfigFile struct {
    prefix      string
    instanceNum int
    extension   string
}

// tries to guess config type and instance id from files
func parseFiles(files []string) {
    re := regexp.MustCompile(`(type1|type2)_(\d+)\.(csv|ini)`)
    var matchedFiles []ParsedConfigFile

    for _, file := range files {
        match := re.FindStringSubmatch(file)

        // we have 3 groups we try to capture in the regex + 1 for the full match
        EXPECTED_MATCH_COUNT := 4

        if len(match) == EXPECTED_MATCH_COUNT {
            fmt.Printf("trying: %v\n", file)
            instanceNum, strConvErr := strconv.Atoi(match[2])

            if strConvErr == nil {
                matchedFiles = append(matchedFiles, ParsedConfigFile{
                    prefix:      match[1],
                    instanceNum: instanceNum,
                    extension:   match[3],
                })
            }
        }

    }
}

func main() {
    files := []string{
        "type1_12.ini",          // match
        "type1_121111.ini",      // match
        "type2_1233.csv",        // match
        "type2_32.csv",          // match
        "type1_.ini",            // don't match
        "type2_32.csv.20141027", // don't match
        "type1_12.",             // don't match
        "_12.ini.",              // don't match
        "_12.ini.11",            // don't match
        "type1_81.ini.20141028", //dont match
        "XNGS.csv",              // don't match
    }

    parseFiles(files)
}

Removing the negated set yields some results but I'm unsure what I have to do mimic the behavior in other regex parser or ignore matches at the end of the filenames

playground link https://play.golang.org/p/6HxutLjnLd

Bobloblawlawblogs
  • 293
  • 2
  • 5
  • 12
  • If you can give an indication of what you actually want the expression to do, I will try to amend my answer with an alternative solution. – Adrian Nov 09 '17 at 20:41
  • There's also two totally different regular expressions here and it's not clear which you're even troubleshooting - the one you quote at the top of the question isn't the same as the one in the source, though the one in the source is the same as the one at the regexr link listed. Which one is this question about? – Adrian Nov 09 '17 at 20:43
  • @adrian apologies for the confusion I edited the question. The regex in the source is what I'm interested in. – Bobloblawlawblogs Nov 11 '17 at 15:29
  • What do you want the expression to actually do? What are you trying to match/exclude? – Adrian Nov 11 '17 at 15:32
  • basically I want to match file names without the date after the extension – Bobloblawlawblogs Nov 13 '17 at 04:03
  • If that's the case, your expressions are doing more than necessary to meet that requirement - `\.\w{3}$` should suffice. – Adrian Nov 13 '17 at 13:13
  • Thanks @Adrian. The `\w{3}$` was what I needed to replace the lookaround operator. If you want to edit your answer to include the new expression I can mark it as the solution – Bobloblawlawblogs Nov 14 '17 at 19:38
  • Updated answer. – Adrian Nov 14 '17 at 19:52

1 Answers1

7

Go's stdlib regexp engine is RE2 which does not support lookaround (e.g. the ?! negative lookahead operator). You can find the complete set of supported Regular Expression syntax in the documentation: https://golang.org/pkg/regexp/syntax/

If all you need is to ensure that the string ends in a three-character file extension, then you can simplify your expression down to just \.\w{3}$ - a literal period, followed by three characters, followed by the end of the string.

Adrian
  • 42,911
  • 6
  • 107
  • 99