2

I have a problem with string reading, I will explain the problem:

I have this code to read a web page and put it in a string:

System.Net.WebRequest request = System.Net.WebRequest.Create(textBox1.Text);

using (System.Net.WebResponse response = request.GetResponse())
{
    using (System.IO.Stream stream = response.GetResponseStream())
    {
        using (StreamReader sr = new StreamReader(stream))
        {
            html = sr.ReadToEnd();
        }
    }
}

Now I would like to take only some parts of this string, how can I do, if I use substring it doesn't take the selected pieces.

Example of a substring code:

Name = html.Substring((html.IndexOf("og:title")+19), (html.Substring(html.IndexOf("og:title") +19).FirstOrDefault(x=> x== '>')));

I would like it to start after the "og: title" and get to the '>', but it doesn't work.

The result is example:

"Valchiria “Intera” Pendragon\">\n<meta property=\"og:image\" conte"
GSerg
  • 76,472
  • 17
  • 159
  • 346
Mrpit
  • 101
  • 1
  • 7

1 Answers1

2

It is easier if you use a library to do it, for example you can take a look at this

Your code, if I understood what you desire, should be like the following:

static void Main(string[] args)
{
    const string startingToken = "og:title\"";
    const string endingToken = "\">";

    var html = "<html><meta property=\"og:title\" Valchiria “Intera” Pendragon\">\n<meta property=\"og:image\" content></html>";

    var indexWhereOgTitleBegins = html.IndexOf(startingToken);
    var htmlTrimmedHead = html.Substring(indexWhereOgTitleBegins + startingToken.Length);

    var indexOfTheEndingToken = htmlTrimmedHead.IndexOf(endingToken);

    var parsedText = htmlTrimmedHead.Substring(0, indexOfTheEndingToken).TrimStart(' ').TrimEnd(' ');

    Console.WriteLine(parsedText);
}

Note that you can also use regular expressions to achieve the same in less line of code, but managing regex are not always easy.

Take a look at this answer: Parsing HTML String

Your question title is probably not correct, because it looks more specific to HTML parsing.

Norcino
  • 5,850
  • 6
  • 25
  • 42
  • I guess your `.TrimStart(' ').TrimEnd(' ')` is equivalent to `.Trim(' ')`. – Jeppe Stig Nielsen Jul 10 '19 at 11:15
  • @JeppeStigNielsen, correct, but the purpose of this code is just to give an idea of what you would need to handle to get what I think was required, code such the one above would not be production code. – Norcino Jul 10 '19 at 12:43