I am taking over a Datamining project made in C# which is parsing some raw text files in order to store usefull data's in databases.
There is no problem for now, everything is working out of the box but I have a misunderstanding with some regular expression's syntax.
In fact, why is the expression Déposé et enregistré le (?<Registred>.+?)\s*(\r\n)
Matching the string Déposé et enregistré le 16/09/2016
I expected the regular expression to be like Déposé et enregistré le ([0-9]{2}\/[0-9]{2}\/[0-9]{4})
to match my string.
The problem that makes me lost is the (?<Registred>.+?)
part which in my opinion shouldn't match a date like 16/09/2016
.
Here is a sample of the code matching the string :
var results = new List<RegexResult>();
String regexS = r.RegexValue;
try
{
var regex = new System.Text.RegularExpressions.Regex(regexS, RegexOptions.None, new TimeSpan(TimeSpan.TicksPerSecond * 3));
var matchCollection = regex.Matches(data.Data);
if (matchCollection.Count > 0)
{
int occurenceCounter = 0;
foreach (Match match in matchCollection)
{
string[] capturedGroup = regex.GetGroupNames();
foreach (string groupName in capturedGroup)
{
string resultValue = match.Groups[groupName].Value.Trim();
if (groupName != "0")
{
results.Add(new RegexResult(data.Id, r, resultValue, groupName, occurenceCounter));
}
log.Info("RawData Id : {0} | Regex Id : {1} | groupName {2} : {3}", data.Id, r.Id, groupName, resultValue);
}
occurenceCounter++;
}
}
}
catch (RegexMatchTimeoutException e)
{
log.Error("RegexMatchTimeoutException for Id {0} and regex {1}", data, regexS, e);
}
return results;
Any ideas ?