2

I have the strings, ["02-03-2013#3rd Party Fuel", "-1#Archived", "2#06-23-2013#Newswire"], which I want to break down into several parts. These strings are prefixed with date and index keys and contain a name.

I've design a RegEx that matches each key properly. However, if I want to match the index key, date key, and name in fell swoop. Only the first key is found. It seems the recursive group isn't working as I expect it should.

private const string INDEX_KEY_REGEX = @"(?<index>-?\d+)";
private const string DATE_KEY_REGEX = @"(?<date>(?:0?[1-9]|1[012])-(?:0?[1-9]|[12]\d|3[01])-\d{4})";
private const string KEY_SEARCH_REGEX = @"(?<R>(?:^|(?<=#))({0})#(?(R)))(?<name>.*)";

private string Name = "2#06-23-2013#Newswire"
... = Regex.Replace(
    Name,
    String.Format(KEY_SEARCH_REGEX, INDEX_KEY_REGEX + "|" + DATE_KEY_REGEX),
    "${index}, ${date}, ${name}"
);

// These are the current results for all strings when set into the Name variable.

// Correct Result: ", 02-03-2013, 3rd Party Fuel"
// Correct Result: "-1, , Archived"
// Invalid Result: "2, , 06-23-2013#Newswire"
// Should be: "2, 06-23-2013, Newswire"

Does a keen eye see something I've missed?


Final Solution As I Needed It

It turns out I didn't need a recursive group. I simply needed 0 to many sequence. Here is the full RegEx.

(?:(?:^|(?<=#))(?:(?<index>-?\d+)|(?<date>(?:0?[1-9]|1[012])-(?:0?[1-9]|[12]\d|3[01])-(\d{2}|\d{4})))#)*(?<name>.*)

And, the segmented RegEx

private const string INDEX_REGEX = @"(?<index>-?\d+)";
private const string DATE_REGEX = @"(?<date>(?:0?[1-9]|1[012])-(?:0?[1-9]|[12]\d|3[01])-(\d{2}|\d{4}))";
private const string KEY_WRAPPER_REGEX = @"(?:^|(?<=#))(?:{0})#";
private const string KEY_SEARCH_REGEX = @"(?:{0})*(?<name>.*)";
roydukkey
  • 3,149
  • 2
  • 27
  • 43
  • It works for this test case if you remove the `^|` test. What's the intent of that? – Bobson Apr 10 '13 at 16:17
  • That was to ensure the keys are always at the beginning and never at the end or having an undermined string before them. I'll test without it but I think it will break the stated requirement. I'll see what I find. – roydukkey Apr 10 '13 at 16:22
  • So with my testing the following results `2#, 06-23-2013, Newswire` and `-1#Archived` (no matches). So that's farther from the desired. – roydukkey Apr 10 '13 at 16:28
  • What's the desired result for all three of your tests? I just know the one. – Bobson Apr 10 '13 at 16:35
  • I've added the current results to the question. – roydukkey Apr 10 '13 at 16:42
  • Thanks. I'll see if I can make it work. – Bobson Apr 10 '13 at 16:46

1 Answers1

1

well, the individual regexs break down into this:

Index: Capture a single positive or negative number. (-, 0 or 1 rep, followed by one or more digits)

date: Specified date string, separated with -. No allowance made for any other date format. Note, the leading '#' and trailing '#' are not handled, it specifically captures the date, and only the date

R: beginning of line OR #, then the formatting replacement to make it one BIG regex...then another #, specified. then a conditional with no false...and true doesn't do anything either.

name: capture whatever is left.

final result, compiled into a single regex.... two captures: R and name. R: (4 parts) R-1: Match either beginning of line or # R-2: Get EITHER (but never both) Date or Index R-3: match # R-4: Empty Conditional Expression name: match whatever is left.

The issue seems to be that you are not matching both index and date

final edit, working regex

Bear with me, this thing is nasty. You have to account for all 4 possibilities, or it wont match every possible case. I couldn't figure out any way to generalize it.

(?:(?<index>-?\d+(?!\d-))#(?<date>(?:0?[1-9]|1[012])-(?:0?[1-9]|[12]\d|:3[01])-\d{4})|(?<date>(?:0?[1-9]|1[012])-(?:0?[1-9]|[12]\d|:3[01])-\d{4})#(?<index>-?\d+)|(?!-?\d+#)(?<date>(?:0?[1-9]|1[012])-(?:0?[1-9]|[12]\d|:3[01])-\d{4})|(?<index>-?\d+)(?!#(?:0?[1-9]|1[012])-(?:0?[1-9]|[12]\d|:3[01])-\d{4}))#(?<name>.*)

ugly, i know. It has 4 initial conditions.

1a) capture <index>#<date>  OR
1b) capture <date>#<index>  OR
1c) capture <index> only, as long as its not followed by a date  OR
1d) capture <date> only, as long as its not preceded by an index
...
2) match but ignore #
3) capture <name>

works in all 4 cases.

Final: Final Edit

There is a way to do this using 3 regexs instead of just 1, which might end up being cleaner.

//note: index MIGHT be preceeded by, and is ALWAYS followed by, a #
indexRegex = @"((?=#)?(?<!\d|-)-?\d+(?=#))";
//same with date
dateRegex = @"((?=#)?(?:0?[1-9]|1[012])-(?:0?[1-9]|[12]\d|3[01])-\d{4}(?=#))";
//then name
nameRegex = @"(?:.*#){1,2}(.*)";

run them each separately against a replace to get the individual variables, then rebuild the string.

Nevyn
  • 2,623
  • 4
  • 18
  • 32
  • That looks mostly correct but doesn't the `+` in the `INDEX_KEY_REGEX` signify that the regex will match at least one or more `\d`. – roydukkey Apr 10 '13 at 16:50
  • Currently, that only works for a single digit index, and might mess up the date...still working on it. – Nevyn Apr 10 '13 at 16:57
  • This is extremely close to what is needed, however, it doesn't account for cases where the date key may be before the index key, `06-23-2013#2#Newswire`. This is the reason I thought to try a recursive statement. How would you handle this? Sorry for not explaining it earlier. – roydukkey Apr 10 '13 at 17:00
  • 1
    @roydukkey - Your data is **ugly**. But that's what Regexs are for. – Bobson Apr 10 '13 at 17:00
  • oo, never realized that date and index could reverse their order there...very nasty...are those the only 2 that could move? – Nevyn Apr 10 '13 at 17:05
  • @Nevyn - I'm impressed. Nice job. – Bobson Apr 10 '13 at 19:26
  • @Nevyn Like wise, I'm impressed too. Thank you! – roydukkey Apr 10 '13 at 19:33
  • 2
    @Bobson truth be told Im still annoyed that I couldn't find a way to get it any shorter, that thing is ugly. If anyone else can figure out a way to shrink it, feel free. – Nevyn Apr 10 '13 at 19:34