0

With these data examples:

/test -test/test/2016/April 
/test -test/test/2016

How does one pattern match so that it can determine whether or not the number 2016 is located in this exact position?

ΩmegaMan
  • 29,542
  • 12
  • 100
  • 122
  • 1
    What do you mean by "exact position", and what were your attempts? If the position is just the number of characters, regexp is the wrong tool for this task. – HugoRune Apr 02 '16 at 11:40
  • Are those literal strings `String s1 = /test -test/test/2016/April` or are you implying this is code `String s1 = "/test -test/test/2016/April"?` Also any other edge cases? Can a space occur before the date? This post is too vague. – ΩmegaMan Apr 02 '16 at 12:51
  • By exact position, I meant 2016 (a four digit number) will be found after the third forward slash and followed with by nothing or a fourth forward slash. Yes. The strings are literals, so:String s1 = "/test -test/test/2016/April" – KenticoLover Apr 04 '16 at 00:05

3 Answers3

1

Assuming, that "exact position" means "third position", the following regex would work:

/(?:[^/]*/){2}(\d{4}).*

In C#, this can be used with the Regex Constructor and the @"" String Syntax, which makes escaping characters obsolete:

var rx = new Regex(@"/(?:[^/]*/){2}(\d{4}).*"); 

If this regex matches a string, the four digits of the year are captured as a result.

Explanation

  • / captures the leading slash character.

  • [^/]* captures any sequence of characters unequal to a slash.

  • / captures a slash character

  • the preceeding two code parts are now wrapped inside non-capturing brackets, which are specified with ?: as the first two characters inside them.

  • Having (?:[^/]*/) now matching a "path segment" like "test/", the pattern must be matched exactly two times in a row. that's why the brackets are followed by the quantifier {2}

  • Then the actual number must be matched: It consists of four digits in a row. This is represented as followed: (\d{4}) where \d means "any number" and - once again - the quantifier defines that there should be 4 in a row.

  • Finally, there can be aribtrary characters behind the number, ("tha path can continue"): This is specified by the . ("match any character") and the quantifier *, which means "any number of occurences".

Note: There are many dialects of Regular Expressions. This on works for the C# regex implemantation, however it should work for many others as well.

Anton Harald
  • 5,772
  • 4
  • 27
  • 61
  • When I run your pattern against `/ -test/test/2016/April` it fails as a match yet the OP says that is one possibility. – ΩmegaMan Apr 02 '16 at 12:41
1

A regex pattern can do validation or as you infer location positioning validation. The key is to setup pattern anchors based on the strings encountered before one gets to just the numeric.

For your case you have literal /s then text then a literal - then literal /s then text....etc. By following those patterns of the literal anchors with generic text, you can require a specific position.

But other numbers could spoof other patterns (noise per se), so you appear to be getting a date. The following will make sure that /{date of 19XX or 20XX}/ is the only valid item for that position.

string pattern = @"
^            # Beginning of line (anchor)
/            # / anchor
[^-]+        # Anything not a dash.
-            # Anchor dash
[^/]+        # Anything not a /
/            # / anchor
[^/]+        # Anything not a /
/            # / anchor
[12][90]\d\d # Allow only a `date` field of 19XX or 20XX.
";

// IgnorePatternWhitespace *only* allows us to comment the pattern 
// and place it on multiple lines (space ignored)
// it does not affect processing of the data.
// Compiled tells the parser to hold the pattern compilation 
// in memory for future processing.
var validator = new Regex(pattern, RegexOptions.IgnorePatternWhitespace | 
                                   RegexOptions.Compiled);

validator.IsMatch("/ -test/test/2016/April"); // True
validator.IsMatch("/ -test/test/2016");       // True
validator.IsMatch("/ -test/test/1985/April"); // True
validator.IsMatch("/ -2017/test/1985/April"); // True
// Negative Tests
validator.IsMatch("/ -2017/test/WTF/April");       // False
validator.IsMatch("/jabberwocky/test/1985/April"); // False, no dash!
validator.IsMatch("////April");                    // false
validator.IsMatch("///2016/April");                // False because no text between `/`
validator.IsMatch("/ -test/test/ 2016/April");     // False because pattern 
                                                  // does not allow a space

Pattern Notes

  • Instead of looking of for the date with \d\d\d\d, I am giving the regex parser a specific anchor type hint that this is either going to be a date in that resides in the twentieth century, 19XX, or the twenty first century, 20XX. So I spell out the first two places of the \d\d\d\d pattern to be a set where either 1 or 2 is the first \d as [12] (1 for a 19xx pattern or 2 for a 20xx pattern) followed by the second place number to be either a nine or a zero[90]. In a modern computer system most dates will be within these two centuries; so why not craft the regex as such.
ΩmegaMan
  • 29,542
  • 12
  • 100
  • 122
0

Your regex will be:

\-(?:[^\/]+\/){2}(\d+)

It will capture number appearing after xx/xx/ pattern where xx/ is adjustable.

Example:

var s1 = "/test -test/test/2016/April";
var s2 = "/test -test/test/2016";

var rx = new Regex ("\\-(?:[^\\/]+\\/){2}(\\d+)");

var m1 = rx.Match(s1);
var m2 = rx.Match(s2);

if (m1.Success && m2.Success) {

    if (m1.Groups[1].Value == m2.Groups[1].Value) {
        Console.WriteLine ("s1 == s2");
    }
}

Based on provided input string s1 and s2, it will print:

s1 == s2
Saleem
  • 8,728
  • 2
  • 20
  • 34
  • Why `Multiline`? The pattern does not contain a `^` or `$` which multiline applies to. (The `^` in the not set does not apply) – ΩmegaMan Apr 02 '16 at 12:44
  • Default is `Multiline` regardless you mention or not but I did explicit. – Saleem Apr 02 '16 at 12:55
  • And why `Compiled`? That's for cases where the regex is being used very frequently and performance is critical. In a situation like this, it does more harm than good. – Alan Moore Apr 02 '16 at 12:56
  • That makes no sense, there are no default options in regex. All options have to be specified. If `Multiline` is not specified the `^` is start of buffer and `$` is End of buffer instead of on the lines. – ΩmegaMan Apr 02 '16 at 12:56
  • Good question. I'm using same regex to match twice and possibly more. So I thought it's better to compile. – Saleem Apr 02 '16 at 12:57
  • @OmegaMan. Please visit https://regex101.com/r/bU3eJ3/1 and try experimenting by adding or removing `m`. see if it makes any difference. – Saleem Apr 02 '16 at 12:59
  • 1
    Usage of multiline in your pattern is not needed because you don't use either `^` or `$`. Going to regex101 is superfluous because the pattern does not apply. Yes your pattern works, I get it. But that doesn't change the fact that `Multiline` is not needed. Don't confuse a learner by unnecessary/unrelated items. (Not an attack...just saying) :-) – ΩmegaMan Apr 02 '16 at 13:01
  • @OmegaMan, no no. You have valid point. I take it as positive criticisim and I'm always open to it. – Saleem Apr 02 '16 at 13:10
  • 1
    See [this question](http://stackoverflow.com/questions/513412/how-does-regexoptions-compiled-work) about what it means to "compile" a regex. tl;dr: You shouldn't use `RegexOptions.Compiled` unless you know you need it. – Alan Moore Apr 02 '16 at 13:11
  • @AlanMoore if the regex is going to be used more than once, compilation is a good thing. I can see that its questionable if only used 2-3 times...would it really make a difference? – ΩmegaMan Apr 02 '16 at 13:12
  • @AlanMoore (O_O). Thanks for pointing an excellent resource. Point taken from both gurus. This is simple regex so will not have any benefit form compilation. However, if regex is complex and repetitive in nature, `RegexOptions.Compiled` is way to go. – Saleem Apr 02 '16 at 13:15
  • OMG! You guys are amazing! What do you suggest I read to help me get it? – KenticoLover Apr 04 '16 at 00:17