0

how can I retrieve both string between STRING & END in this sentence

"This is STRING a222 END, and this is STRING b2838 END."

strings that I want to get:

a222 
b2838

Following is my code, and i only manage to get first string which is a222


string myString = "This is STRING a222 END, and this is STRING b2838 END.";

int first = myString.IndexOf("STRING") + "STRING".Length;
int second= myString.LastIndexOf("END");

string result = St.Substring(first, second - first);

.

RGuest
  • 19
  • 1

5 Answers5

2

Here is the solution using Regular Expressions. Working Code here

var reg = new Regex("(?<=STRING ).*?(?= END)");
var matched = reg.Matches("This is STRING a222 END, and this is STRING b2838 END.");

foreach(var m in matched)
{
   Console.WriteLine(m.ToString());
}
Sowmyadhar Gourishetty
  • 1,843
  • 1
  • 8
  • 15
  • 1
    RegEx is to painfull for these kind of operation. – Biju Kalanjoor Aug 19 '20 at 13:59
  • It's not, the time complexity in finding the matched string will be O(N), N -> is the length of the string. Regular expressions functions in the library are already optimized. Can you please explain why it is so? – Sowmyadhar Gourishetty Aug 19 '20 at 14:02
  • Comparing to simple "split" RregEx is expensive. Also we need to Replace method again and again. – Biju Kalanjoor Aug 19 '20 at 14:04
  • Using a combination of lookbehind (for STRING) & lookahead (for END) in your regex would remove the need to do the replaces : _"var reg = new Regex( "(?<=STRING ).*?(?= END)");"_ Here : https://dotnetfiddle.net/qHL2JI – PaulF Aug 19 '20 at 14:08
  • 1
    @PaulF, Thanks for that, updated the logic as mentioned. I didn't know about that thanks for letting us know the new approach – Sowmyadhar Gourishetty Aug 19 '20 at 14:14
  • 2
    @SowmyadharGourishetty: Regex Lookahead/behind can be very useful - theres a bit of info here with some links : https://stackoverflow.com/questions/2973436/regex-lookahead-lookbehind-and-atomic-groups – PaulF Aug 19 '20 at 14:26
1

You can iterate over indexes,

string myString = "This is STRING a222 END, and this is STRING b2838 END.";
//Jump to starting index of each `STRING`
for(int i = myString.IndexOf("STRING");i > 0; i = myString.IndexOf("STRING", i+1))
{
    //Get Index of each END
    var endIndex = myString.Substring(i + "STARTING".Length).IndexOf("END");
    //PRINT substring between STRING and END of each occurance
    Console.WriteLine(myString.Substring(i + "STARTING".Length-1, endIndex));
}

.NET FIDDLE


In your case, STRING..END occurs multiple times, but you were getting index of only first STRING and last index of END which will return substring, starts with first STRING to last END.

i.e.

a222 END, and this is STRING b2838 
Prasad Telkikar
  • 15,207
  • 5
  • 21
  • 44
  • `End` maybe a type-o? I'm not sure if it's case sensitive, but if so, `END` is maybe what the user is after perhaps? Also, could you explain what the OP did wrong in their attempt and what is not working as intended so they could better understand? – Trevor Aug 19 '20 at 13:45
  • 1
    @Çöđěxěŕ, thanks for your input. I fixed it. Have a look at my answer – Prasad Telkikar Aug 19 '20 at 14:21
  • 1
    Thanks for the update and great answer with explanation! – Trevor Aug 19 '20 at 14:52
  • You'll have that, it is better to leave a comment as to why, but I've known many don't. – Trevor Aug 20 '20 at 10:49
1

You can pass a value for startIndex to string.IndexOf(), you can use this while looping:

    IEnumerable<string> Find(string input, string startDelimiter, string endDelimiter)
    {
        int first = 0, second;

        do
        {
            // Find start delimiter
            first = input.IndexOf(startDelimiter, startIndex: first) + startDelimiter.Length;

            if (first == -1) 
                yield break;


            // Find end delimiter
            second = input.IndexOf(endDelimiter, startIndex: first);

            if (second == -1)
                yield break;


            yield return input.Substring(first, second - first).Trim();
            first = second + endDelimiter.Length + 1;
        }
        while (first < input.Length);
    }
1

You've already got some good answers but I'll add another that uses ReadOnlyMemory from .NET core. That provides a solution that doesn't allocate new strings which can be nice. C# iterators are a common way to transform one sequence, of chars in this case, into another. This method would be used to transform the input string into sequence of ReadOnlyMemory each containing the tokens your after.

    public static IEnumerable<ReadOnlyMemory<char>> Tokenize(string source, string beginPattern, string endPattern)
    {
        if (string.IsNullOrEmpty(source) ||
            string.IsNullOrEmpty(beginPattern) ||
            string.IsNullOrEmpty(endPattern))
            yield break;

        var sourceText = source.AsMemory();

        int start = 0;

        while (start < source.Length)
        {
            start = source.IndexOf(beginPattern, start);

            if (-1 != start)
            {
                int end = source.IndexOf(endPattern, start);

                if (-1 != end)
                {
                    start += beginPattern.Length;
                    yield return sourceText.Slice(start, (end - start));
                }
                else
                    break;

                start = end + endPattern.Length;
            }
            else
            {
                break;
            }
        }
    }

Then you'd just call it like so to iterate over the tokens...

    static void Main(string[] args)
    {
        const string Source = "This is STRING a222 END, and this is STRING b2838 END.";

        foreach (var token in Tokenize(Source, "STRING", "END"))
        {
            Console.WriteLine(token);
        }
    }
MikeJ
  • 1,299
  • 7
  • 10
0
string myString = "This is STRING a222 END, and this is STRING b2838 END.";
// Fix the issue based on @PaulF's comment.
if (myString.StartsWith("STRING"))
     myString = $"DUMP {myString}";

var arr = myString.Split(new string[] { "STRING", "END" }, StringSplitOptions.RemoveEmptyEntries);

for (int i = 0; i < arr.Length; i++)
{
      if(i%2 > 0)
      {
          // This is your string
          Console.WriteLine(arr[i].Trim());
      }
}
Biju Kalanjoor
  • 532
  • 1
  • 6
  • 12
  • 3
    That only works if STRING & END are paired, and also some text precedes the first STRING – PaulF Aug 19 '20 at 13:42
  • 1
    Hi PaulF, I believe he mentioned in the question "how can I retrieve both string between STRING & END in this sentence". – Biju Kalanjoor Aug 19 '20 at 13:47
  • 1
    @BijuKalanjoor could you update your post to include what the OP did wrong in their attempt and how this addresses their issue? – Trevor Aug 19 '20 at 13:48
  • 1
    @BijuKalanjoor: if all the OP wanted to do was extract the values from that particular string then I would suggest counting the characters & doing 2 substring operations. I am assuming that OP actually wants a generic solution that will work for any string passed to it. I have pointed out two ways your solution will fail to get the results asked for. It may be this answer suits OP though - so if it is marked as the correct answer then I guess it is what is required. – PaulF Aug 19 '20 at 14:20
  • @PaulF, according to this sentence "how can I retrieve both string between STRING & END in this sentence" OP wants to get a string in between two token. That's why i suggest this method rather than regex. if the OP really wants the behavior which you mentioned , I'm completely agree with you. – Biju Kalanjoor Aug 20 '20 at 03:10
  • _"// work around based on @PaulF's comment."_ - my first reaction when I see/hear anyone in my team mention "work around" is to ask when they are expecting to fix the problem in their code that they need to work around. I am opposed to the idea of "work arounds" - a work around is a temporary fix to an underlying problem that needs addressing, once the "work around" is introduced the problem appears to have been fixed & is forgotten - the "work around" ends up in production code but the original problem is still there & may cause problems in the future. Fix the problem, don't work around it. – PaulF Aug 20 '20 at 07:05
  • @PaulF, lol...that's my bad. i'll correct the comment.thanks – Biju Kalanjoor Aug 20 '20 at 07:25
  • Exactly my point - it is not the comment that needs fixing - it is the code that needs fixing. – PaulF Aug 20 '20 at 07:26
  • I fixed the issue "also some text precedes the first STRING". Other I believe OP wants. – Biju Kalanjoor Aug 20 '20 at 07:28
  • Your comment was correct - you worked around the problem highlighted, you did't attempt to correct the problem with your code. Generating a second string to work around a simple code fix may have major performance implications. – PaulF Aug 20 '20 at 07:36
  • Also - does OP want b2838 returning if the final END is not present? Your code returns anything after the final STRING, even if there is no END following. – PaulF Aug 20 '20 at 07:43