0

I have a scenario to get string between two words but the start words repeats multiple times in the text file and the end word is unique. I want the entire string between the last start word and the end word

I tried regex to get multiple matches- It returns the entire string from the first start word to end word.

Then Used Loop condition and removed the same start word and executed. But this ways takes a long time and not good pratice too.

Segmentmatch = Regex.Match(text, String.Format("Segment(.*?)0091",), RegexOptions.Singleline)

FULL TEXT:

Segement DTM*  Tag DTM                                                  
0374:2*                       DATE/TIME QUALIFIER

Segment R4*    Tag R4  
0115*                       PORT OR TERMINAL FUNCTION CODE

Segment R2A*   Tag R2A  
1431*                         PREFERENCE                                                                    
0091:3*                       TRANSPORTATION METHOD/TYPE CODE

Expected Result: Text between the last segment and 0091

R2A*   Tag R2A  
1431*                         PREFERENCE 

Actual Result: Code returns entire Text between the first segment and 0091

DTM*   Tag DTM                                                  
0374:2*                       DATE/TIME QUALIFIER

Segment R4*    Tag R4  
0115*                       PORT OR TERMINAL FUNCTION CODE

Segment R2A*   Tag R2A  
1431*                         PREFERENCE
Sweeper
  • 213,210
  • 22
  • 193
  • 313
Raj94
  • 37
  • 1
  • If the Segment part and 0091 occur twice, don you expect 1 match https://regex101.com/r/AQXU3z/1 or do you expect 2 matches https://regex101.com/r/AQXU3z/2 ? – The fourth bird Aug 23 '19 at 14:34

2 Answers2

1

To match the last occurrence of Segment and capture in a group until matching 0091.

.*\bSegment[ \t]+(.*)\r?\n0091\b
  • .*\bSegment Match any char 0+ times including newline, then match Segment
  • [ \t]+(.*) Match 1+ tabs or spaces and any char 0+ times
  • \r?\n0091\b Match newline and 0091

Regex demo

enter image description here

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    OP does not need any `(?s)` modifier as OP is already using `RegexOptions.Singleline` and the dot already matches a newline here. – Wiktor Stribiżew Aug 23 '19 at 14:16
  • When I use the same regex .net. The entire text from the first Segment DTM to 0091 is returning. How to use match function in C# to return the result in groups. Regex r = new Regex(".*\\bSegment[ \t]+(.*)\r?\n0091\b", RegexOptions.Singleline); MatchCollection matches = r.Matches(text); – Raj94 Aug 23 '19 at 15:47
  • @Raj94 I have added a screenshot where you can see that the match already is in group 1. If the Segment part and 0091 occur twice, don you expect 1 match https://regex101.com/r/AQXU3z/1 or do you expect 2 matches https://regex101.com/r/AQXU3z/2 ? – The fourth bird Aug 23 '19 at 15:50
  • @Thefourthbird, Gotcha Thanks a lot.. C# code to get from group Regex r = new Regex(".*\\bSegment[ \\t]+(.*)\\r?\\n0091\\b", RegexOptions.Singleline); MatchCollection matches = r.Matches(text); foreach (Match ItemMatch in matches) { Group g = ItemMatch.Groups[1]; string SegmentName = ItemMatch.ToString(); } – Raj94 Aug 23 '19 at 16:59
  • @Raj94 Did you see my other question about which of the 2 matches you expect? – The fourth bird Aug 23 '19 at 17:02
0

You don't need regex for this if the start and end words are all constants.

// LastIndexOf is the magic here
var segmentIndex = yourString.LastIndexOf("Segment");
var startIndex = segmentIndex + 7; // 7 is the length of the start word;
var endIndex = yourString.IndexOf("0091");
var extractedString = yourString.Substring(startIndex, endIndex - startIndex);

If the start and end words are not constants, and are instead defined as substrings that match a regex pattern, you could use Regex.Matches to find the last match.

var lastMatch = Regex.Matches(yourString, someRegex).Cast<Match>().Last();
var startIndex = lastMatch.Index + lastMatch.Length;
var endIndex = Regex.Match(yourString, someOtherRegex).Index;
// same as above
Sweeper
  • 213,210
  • 22
  • 193
  • 313