How can I removed the rest of string after a particular string/pattern?

Question

Does anyone know how can I removed the rest of string after a particular string or pattern?

For example: I save the html code into a string as below:

String test;

test = '<html xmlns="http://www.w3.org/1999/xhtml"><head runat="server"><title></title></head><body><table> <tr><td>test</td></tr> </table><div id="13"> </body> test test test test </html>'

How can I removed the rest of the text after <div id="13"> in C# .net?

What is that 'particular sign'? Do you know it's location or is it text after which you want to cut? — Migol, Aug 19 '15 at 00:35
Is there any rule for the semantics? I could see that `
` doesn't have ending tag. Is there any special case about it? — Joel Legaspi Enriquez, Aug 19 '15 at 00:36

Andrew · Accepted Answer · 2015-08-19T00:56:45.720

2

If you want the ending token to be excluded, you can use this:

string test = "<html xmlns=\"http://www.w3.org/1999/xhtml\"><head runat=\"server\"><title></title></head><body><table> <tr><td>test</td></tr> </table><div id=\"13\"> </body> test test test test </html>";
string result = test.Split(new string[] { "<div id=\"13\">"}, StringSplitOptions.None).FirstOrDefault();

If you want the ending token to be included, you can use this:

string test = "<html xmlns=\"http://www.w3.org/1999/xhtml\"><head runat=\"server\"><title></title></head><body><table> <tr><td>test</td></tr> </table><div id=\"13\"> </body> test test test test </html>";
string endString = "<div id=\"13\">";
string result = test.Substring(0, test.IndexOf(endString) + endString.Length);

Beware that string literals must be enclosed in double quote characters and not apostrophes, and quote characters inside them must be escaped by preceding them with a \.

Also note that in my code I haven't done any type of validation, I leave that up to you. :)

edited Aug 19 '15 at 00:56

answered Aug 19 '15 at 00:50

Andrew

7,602
2
34
42

1

They must only be escaped if its a string literal (in the case of the split token in `result`). The html presented in `test` would most likely be downloaded from the internet or read from a file and in that case would not require modification. – Sam Axe Aug 19 '15 at 00:55
And finally, any string prefixed with `@` needs no escape sequences – D. Ben Knoble Aug 19 '15 at 01:00
1

You still need them for the double quotes (otherwise how can the compiler know if the string has ended?), but in that case you must use double double-quotes. :D – Andrew Aug 19 '15 at 01:03

score 0 · Answer 2 · answered Aug 19 '15 at 00:53

There are many ways to achieve this / which to use depends upon your exact requirements (i.e. are you literally searching for <div id="13"> or do you want any div tag with a numeric id / do you care if it has other attributes / do you care about additional whitespace in the text / are you really using a string or are you parsing the html; etc.

Below is an example of how you could use a Regex to match the exact string. An advantage of this approach is it gives you a lot of flexibility, so should be easy to tweak as your requirements become better defined.

    var regex = new Regex(".*?<div id=\"13\">");
    var test = "<html xmlns=\"http://www.w3.org/1999/xhtml\"><head runat=\"server\"><title></title></head><body><table> <tr><td>test</td></tr> </table><div id=\"13\"> </body> test test test test </html>";
    var match = regex.Match(test);
    if (match.Success)
    {
        Console.WriteLine("Found!");
        Console.WriteLine(match.Value);
    }

Full Code:

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        var regex = new Regex(".*?<div id=\"13\">");
        var test = "<html xmlns=\"http://www.w3.org/1999/xhtml\"><head runat=\"server\"><title></title></head><body><table> <tr><td>test</td></tr> </table><div id=\"13\"> </body> test test test test </html>";
        var match = regex.Match(test);
        if (match.Success)
        {
            Console.WriteLine("Found!");
            Console.WriteLine(match.Value);
        }
        else
        {
            Console.WriteLine("Not Found!");
        }
        Console.ReadLine();         
    }
}

How can I removed the rest of string after a particular string/pattern?

2 Answers2