I've been recently learning about regular expressions. I'm trying to gather FDF objects into individual strings, which I can then parse. The problem I'm having is that my code only matches the first occurrence and all other "objects" in the FDF file are ignored.
Objects begin on their own line with 2 numbers and the string "obj," and then a carriage return (not a line feed). They end after a carriage return and the string "endobj".
//testing parsing into objects...
List<String> FDFobjects = new List<String>();
String strRegex = @"^(?<obj>\d+ \d+) obj\r(?<objData>.+?)\rendobj(?=\r)";
Regex useRegex = new Regex(strRegex, RegexOptions.Multiline | RegexOptions.Singleline);
StreamReader reader = new StreamReader(FileName);
String fdfString = reader.ReadToEnd();
reader.Close();
foreach (Match useMatch in useRegex.Matches(fdfString))
FDFobjects.Add(useMatch.Groups["objData"].Value);
if (FDFobjects.Count > 0)
Console.WriteLine(FDFobjects[0]);
Console.WriteLine(FDFobjects.Count);
(I was using $ at the end of the regex string, but that matches 0 times, whereas using (?=\r) matches once.)
Edit: Some line returns are CR/LF, and some are just CR. I don't know if it's always consistent for the different parts of the file, so I just check for all of them. I've settled on the following, which seems to work perfectly so far (and I'm not using the Multiline option). Adding the look behind is what made the biggest difference here....
... = new Regex(@"(?<=^|[^\\](\r\n|\r|\n))(?<objName>\d+ \d+) obj(\r\n|\r|\n)(?<objData>.*?)(?<!\\)(\r\n|\r|\n)endobj(?=\r\n|\r|\n|$)", RegexOptions.Singleline);