-3

I'm parsing a text file line by line and for each line I have a special regex. However in one case a pattern is matching two lines. One that is a correct match and another line only partialy because a couple of values are optional.

Invalid match:

BNE1010/1000 HKG1955/2005 7/PLD/CLD/YLD

matches patial string (shouln't match this at all):

BNE1010/1000

Correct match (matches the entire string):

RG878A/21AUG15 GIG/BOG 1/RG/AV 3/AV 4/AV 5/RG 6/AV081C/22 7/CDC/YD 9/TP

The regex for this is quite long and contains several optionl groups:

^(?<FlightDesignator>([A-Z0-9]{2}[A-Z]?)([0-9]{3,4}))(?<OperationalSuffix>[A-Z])?(?<FlightIdentifierDate>\/(\d{2})([A-Z]{3})?(\d{2})?)?(\s(?<FlightLegsChangeIdentifier>(\/?[A-Z]{3})+)(?=(\s|$)))?(\s1(?<JointOperationAirlineDesignators>(\/.{2}[A-Z]?)+))?(\s3\/(?<AircraftOwner>([A-Z]{2}|.)))?(\s4\/(?<CockpitCrewEmployer>(.+?)(?=(?: \d\/|$))))?(\s5\/(?<CabinCrewEmployer>([A-Z]{2}|.)))?(?<OnwardFlight>\s6\/(([A-Z0-9]{2}[A-Z]?)([0-9]{3,4}))([A-Z])?(\/(\d{2})([A-Z]{3})?(\d{2})?)?)?(\s7\/(?<MealServiceNote>(\/?[A-Z]{0,3})+))?(\s9\/(?<OperatingAirlineDisclosure>(.{2}[A-Z]?)))?

I think there is no need to study the entire regex becasue it's build dynamically from smaller patterns at runtime and all the parts work correctly. Also lots of combinations are tested with unit tests and they all work... as long as I try to parse ony the line that should be matched by the pattern.

Currently I'm checking if the entire string is matched by

match.Group[0].Value == line

but I find it's quite ugly. I know from JavaScript the regex engine provides an Index property where the regex engine stopped. So my idea was to compare the index with the length of the string. Unfortunatelly I wasn't able to find such a property in C#.

Another idea would be to modify the regex so that it matches only one line and no partial lines.

Example: https://regex101.com/r/dM5wU4/1

The example contains only two cases because there aren't actually any combinations that would change its behavior. I could remove some parameters but it wouldn't change anything.

EDIT:

I've edited my question. Sorry to every for not providing all the information at the first time. I won't ask any more questions when writing on the phone :) It wasn't a good idea. Hopefully it won't get closed now.

You asked whether I could simplify the regex. I would do it if I could and knew how. If it was easy I wouldn't have asked. The problem started as the regex ans string became bigger during development. Now they are at the production length and I can't actually make them shorter even for the sake of the quesion, sorry.

EDIT-2:

I found the reason why I couldn't find the inherited Index and Length properties of the Match class.

For some strange reason when selecting the Match class and pressing F1 Visual Studio opened the wrong help page (Match Properties) even though I'm not working with the Micro Framework. I didn't notice that but I was indeed wondering why there is very little information. Thx to @Jamiec for the correct link. I won't trust Visual Studio anymore when hitting F1.

Community
  • 1
  • 1
t3chb0t
  • 16,340
  • 13
  • 78
  • 118
  • 1
    I would love to see some code to make this post clear to me. – Patrick Hofman Sep 24 '15 at 11:19
  • 1
    I suggest you put examples. Include both of your regexs and a handful of examples indicating what should and shouldn't be a hit. As it stands it is hard to make sense of what you are saying – musefan Sep 24 '15 at 11:20
  • 2
    Why couldn't you find it? `match.Index`, or if you need the index of the group - `match.Groups[n].Index`. Then you can also use the `.Length` to see if the match length is equal to the whole string. As for your regex, try adding `$` at the end (or `\r?$`). – Wiktor Stribiżew Sep 24 '15 at 11:22
  • I've added an example. – t3chb0t Sep 24 '15 at 11:23
  • 3
    Somehow, your examples have made this question *harder* to understand. Simply adding `$` at the end of your regex makes it match the entire string, and that stops the first line matching - but I doubt thats the answer to your question. – Jamiec Sep 24 '15 at 11:24
  • 1
    You can call `Regex.Match` instead of `Regex.IsMatched` and check result `Match` object to retrieve necessary info (the start and the length of matching, as I can see). – Mark Shevchenko Sep 24 '15 at 11:24
  • @t3chb0t: Does adding `\r?$` at the end of the regex solve the issue? – Wiktor Stribiżew Sep 24 '15 at 11:27
  • @Jamiec I was afraid of that. I don't actually ask how to correct the pattern but rather how do I know if everything or only a part is matched. If I try to parse the first line with my pattern I get match.Success true because the other parameters are optional. – t3chb0t Sep 24 '15 at 11:28
  • Can't you simplify your sample a little and post it on-site please? – Patrick Hofman Sep 24 '15 at 11:39
  • @PatrickHofman I would have done it if I could :-( sorry. If it was easy (for me) I would have asked. – t3chb0t Sep 24 '15 at 11:49
  • Sorry to everyone for this poorly written question. I will improve it later as soon as I get to my computer ;-) no more questions on the phone. It's too hard to write it with examples. – t3chb0t Sep 24 '15 at 12:03
  • Are you parsing it line by line? Or is it `Matches()` against a multiline list of values? – Mariano Sep 24 '15 at 12:12
  • @Mariano I'm doing it line by line. In the example are just two of them where I had problems with because of partial matches. – t3chb0t Sep 24 '15 at 12:25
  • 1
    I've edited the question so that it at least now looks like it should have looked from the begining ;-) – t3chb0t Sep 24 '15 at 16:27

1 Answers1

1

Disclaimer: Im going to add this, but I doubt its the solution. If it's not this part will get deleted in short order

You can add a $ at the end of your regular expression. This stops your first example matching but continues to match the second example.

As you've not provided any more than 2 examples, its unclear if this actually solves all your cases or just that one specific false positive.


My question is whether it is possible to check if a regular expression matched the entire sting without checking the first group against the original line?

If you're not adverse to checking the entire match to the length of the string you can do that too:

var regex = new Regex(@"^(?<FlightDesignator>([A-Z0-9]{2}[A-Z]?)([0-9]{3,4}))(?<OperationalSuffix>[A-Z])?(?<FlightIdentifierDate>\/(\d{2})([A-Z]{3})?(\d{2})?)?(\s(?<FlightLegsChangeIdentifier>(\/?[A-Z]{3})+)(?=(\s|$)))?(\s1(?<JointOperationAirlineDesignators>(\/.{2}[A-Z]?)+))?(\s3\/(?<AircraftOwner>([A-Z]{2}|.)))?(\s4\/(?<CockpitCrewEmployer>(.+?)(?=(?: \d\/|$))))?(\s5\/(?<CabinCrewEmployer>([A-Z]{2}|.)))?(?<OnwardFlight>\s6\/(([A-Z0-9]{2}[A-Z]?)([0-9]{3,4}))([A-Z])?(\/(\d{2})([A-Z]{3})?(\d{2})?)?)?(\s7\/(?<MealServiceNote>(\/?[A-Z]{0,3})+))?(\s9\/(?<OperatingAirlineDisclosure>(.{2}[A-Z]?)))?");

var input1 = @"BNE1010/1000 HKG1955/2005 7/PLD/CLD/YLD";
var input2 = @"RG878A/21AUG15 GIG/BOG 1/RG/AV 3/AV 4/AV 5/RG 6/AV081C/22 7/CDC/YD 9/TP";

var match1 = regex.Match(input1);
var match2 = regex.Match(input2); 

Console.WriteLine(match1.Length == input1.Length); // False
Console.WriteLine(match2.Length == input2.Length); // True

Live example: http://rextester.com/NIBE6349

Jamiec
  • 133,658
  • 13
  • 134
  • 193
  • Wow, it really seems to have solved it. Both examples are actually all I have. All the number/slash parameters are optional but there aren't many combinations used so I guess I will have to rewrite my question later.. When not writing from the phone ;-) – t3chb0t Sep 24 '15 at 11:34
  • I now know why I didn't find the length property. It's provided by the Capture class that is one of the base classes for the Match class and I was only looking in msdn expecting the Match class itself would have it bit it's not listed on its page. If I only have tried in Visual Studio. *embarrassed* – t3chb0t Sep 24 '15 at 11:41
  • @t3chb0t I dont get that - if I look at the docs for `Match` (https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.match%28v=vs.110%29.aspx) `Length` is listed as a property, and that its inherited from `Capture`?!? – Jamiec Sep 24 '15 at 11:44
  • See here: msdn.microsoft.com/en-us/library/hh454327.aspx I pressed F1 in Visual Studio on the Match class and this page opened. Pretty misleading. I was wondering what happened because like you've said, I know it listing all inherited members.i – t3chb0t Sep 24 '15 at 11:52