1

I can't seem to separate filenames from this format using a regex pattern with named captures:

Files[Some File 1 [0100152000022000][v0].txt[7268474425]Some File Two [0100152000022800][v720896].txt[661204928]Some File Three [Extra Info Ignored][0100152000023001][v0].txt[121034]]

Using pattern:

^Files\[((?'name'(.*?))\[(?'length'(\d+))\],?)*\]$

It works fine usually, however when the filenames contain extra ']' and '[', the parsing does not work correctly and splits the next portion as a filename.

How can I support extra ']' and '[' in filenames using while retaining the matched result?

C# parsing code:

string text = "Files[Some File 1 [0100152000022000][v0].txt[7268474425]Some File Two [0100152000022800][v720896].txt[661204928]Some File Three [Extra Info Ignored][0100152000023001][v0].txt[121034]]";
var regex = new Regex(@"^Files\[((?'name'(.*?))\[(?'length'(\d+))\],?)*\]$");
var match = regex.Match(text);
var names = match.Groups["name"].Captures.Cast<Capture>();
var lengths = match.Groups["length"].Captures.Cast<Capture>();
var filelist = names.Zip(lengths, (f, n) => new { file = f.Value, length = long.Parse(n.Value) }).ToArray();
WLFree
  • 119
  • 10
  • Consider using split: https://dotnetfiddle.net/fazhjA – Rand Random Oct 09 '22 at 23:11
  • File Name.txt[Whatever][453543].txt <-- I need captured: File Name.txt, 453543 – WLFree Oct 09 '22 at 23:21
  • 2
    use capture group in split: https://dotnetfiddle.net/aa95Ei - notice the () in the split’s regex – Rand Random Oct 09 '22 at 23:28
  • Yes! This got it working – WLFree Oct 09 '22 at 23:46
  • Does this answer your question? [Split a string by another string in C#](https://stackoverflow.com/questions/2245442/split-a-string-by-another-string-in-c-sharp) – BurnsBA Oct 10 '22 at 14:48
  • At a performance standpoint, that is a reduction. I use a while loop and a memorystream for this, its much faster. – WLFree Oct 10 '22 at 16:49
  • @Rand Random that solution works if I wish to iterate each field one by one, I want a pair (A and B before moving on to C and D for example). Thats where 3 hours at your code didn't produce the expected result, but came very close. – WLFree Oct 10 '22 at 16:50
  • here you may find how to pair it: https://dotnetfiddle.net/hL7sLu (the last output) - you can modify the second part by simply redefining the capture group in split’s regex eg. https://dotnetfiddle.net/z1n5Io – Rand Random Oct 10 '22 at 17:07
  • You can see what im truing to do here (we got this far, see the UPDATE header) - https://stackoverflow.com/questions/74010610/regex-how-to-match-2-fields – WLFree Oct 10 '22 at 17:10
  • I will run the code (I spend hours on the first dotnetfiddle lol it was insane). The issue was with your code, was adding a *? in place of the ".txt" in the expression. That made the results break too early and split 2 halfs of a single filename across 2 iterations before producing the rest in the same fashion. File extensions are 'Unknown' in this context – WLFree Oct 10 '22 at 17:11

0 Answers0