-1

Does anyone know how can I removed the rest of string after a particular string or pattern?

For example: I save the html code into a string as below:

String test;

test = '<html xmlns="http://www.w3.org/1999/xhtml"><head runat="server"><title></title></head><body><table> <tr><td>test</td></tr> </table><div id="13"> </body> test test test test </html>'

How can I removed the rest of the text after <div id="13"> in C# .net?

slugster
  • 49,403
  • 14
  • 95
  • 145
Jin Yong
  • 42,698
  • 72
  • 141
  • 187

2 Answers2

2

If you want the ending token to be excluded, you can use this:

string test = "<html xmlns=\"http://www.w3.org/1999/xhtml\"><head runat=\"server\"><title></title></head><body><table> <tr><td>test</td></tr> </table><div id=\"13\"> </body> test test test test </html>";
string result = test.Split(new string[] { "<div id=\"13\">"}, StringSplitOptions.None).FirstOrDefault();

If you want the ending token to be included, you can use this:

string test = "<html xmlns=\"http://www.w3.org/1999/xhtml\"><head runat=\"server\"><title></title></head><body><table> <tr><td>test</td></tr> </table><div id=\"13\"> </body> test test test test </html>";
string endString = "<div id=\"13\">";
string result = test.Substring(0, test.IndexOf(endString) + endString.Length);

Beware that string literals must be enclosed in double quote characters and not apostrophes, and quote characters inside them must be escaped by preceding them with a \.

Also note that in my code I haven't done any type of validation, I leave that up to you. :)

Andrew
  • 7,602
  • 2
  • 34
  • 42
  • 1
    They must only be escaped if its a string literal (in the case of the split token in `result`). The html presented in `test` would most likely be downloaded from the internet or read from a file and in that case would not require modification. – Sam Axe Aug 19 '15 at 00:55
  • And finally, any string prefixed with `@` needs no escape sequences – D. Ben Knoble Aug 19 '15 at 01:00
  • 1
    You still need them for the double quotes (otherwise how can the compiler know if the string has ended?), but in that case you must use double double-quotes. :D – Andrew Aug 19 '15 at 01:03
0

There are many ways to achieve this / which to use depends upon your exact requirements (i.e. are you literally searching for <div id="13"> or do you want any div tag with a numeric id / do you care if it has other attributes / do you care about additional whitespace in the text / are you really using a string or are you parsing the html; etc.

Below is an example of how you could use a Regex to match the exact string. An advantage of this approach is it gives you a lot of flexibility, so should be easy to tweak as your requirements become better defined.

    var regex = new Regex(".*?<div id=\"13\">");
    var test = "<html xmlns=\"http://www.w3.org/1999/xhtml\"><head runat=\"server\"><title></title></head><body><table> <tr><td>test</td></tr> </table><div id=\"13\"> </body> test test test test </html>";
    var match = regex.Match(test);
    if (match.Success)
    {
        Console.WriteLine("Found!");
        Console.WriteLine(match.Value);
    }

Full Code:

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        var regex = new Regex(".*?<div id=\"13\">");
        var test = "<html xmlns=\"http://www.w3.org/1999/xhtml\"><head runat=\"server\"><title></title></head><body><table> <tr><td>test</td></tr> </table><div id=\"13\"> </body> test test test test </html>";
        var match = regex.Match(test);
        if (match.Success)
        {
            Console.WriteLine("Found!");
            Console.WriteLine(match.Value);
        }
        else
        {
            Console.WriteLine("Not Found!");
        }
        Console.ReadLine();         
    }
}
JohnLBevan
  • 22,735
  • 13
  • 96
  • 178