1

is it possible, to get the text relevant part of a formated html/css code? I got this content:

<div class="ExternalClass0909250B34584AE5AA58772B3064DCD5">
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Solution (SO_)= lml</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Product (PR_)= slider</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Team (T_) = kehrberger</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">&#160;</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Anforderer = renner</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">&#160;</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Erfolgt ein FAK – Einsatz? Nein&#160; </p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Werksvertrag </p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Beistellung relevant?&#160; nein</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">&#160;</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Anlieferadresse&#58;</p>
    <p style="margin&#58;0in;color&#58;black;font-family&#58;arial;font-size&#58;10pt;">Bürocampus Wangen - Kofi Warenannahme</p>
    <p style="margin&#58;0in;color&#58;black;font-family&#58;arial;font-size&#58;10pt;">Joachim Renner (Daimler AG)</p>
    <p style="margin&#58;0in;color&#58;black;font-family&#58;arial;font-size&#58;10pt;">Hedelfinger Str. 60</p>
    <p style="margin&#58;0in;color&#58;black;font-family&#58;arial;font-size&#58;10pt;">70327 Stuttgart</p>
</div>

When I just inserte this in the text window here, he displays this:

<div class="ExternalClass0909250B34584AE5AA58772B3064DCD5">
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Solution (SO_)= lml</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Product (PR_)= slider</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Team (T_) = kehrberger</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">&#160;</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Anforderer = renner</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">&#160;</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Erfolgt ein FAK – Einsatz? Nein&#160; </p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Werksvertrag </p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Beistellung relevant?&#160; nein</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">&#160;</p>
    <p style="margin&#58;0in;font-family&#58;calibri;font-size&#58;11pt;">Anlieferadresse&#58;</p>
    <p style="margin&#58;0in;color&#58;black;font-family&#58;arial;font-size&#58;10pt;">Bürocampus Wangen - Kofi Warenannahme</p>
    <p style="margin&#58;0in;color&#58;black;font-family&#58;arial;font-size&#58;10pt;">Max Mustermann (Company)</p>
    <p style="margin&#58;0in;color&#58;black;font-family&#58;arial;font-size&#58;10pt;">Musterstraße 60</p>
    <p style="margin&#58;0in;color&#58;black;font-family&#58;arial;font-size&#58;10pt;">12345 Musterstadt</p>
</div>

Is it possible in C# to get just the text as shown in the yellow box?

Thank you.

tech2017
  • 1,806
  • 1
  • 13
  • 15
Jan021981
  • 521
  • 3
  • 28
  • 1
    Just about anything is possible. Have you looked into libraries that understand HTML and tries to use them? What was the result of that? – mason Aug 11 '17 at 17:58
  • You could also try using an XML parser. .NET has a few built in. – Adam Schiavone Aug 11 '17 at 18:00
  • 1
    @AdamSchiavone [HTML is not XML](https://stackoverflow.com/questions/5558502/is-html5-valid-xml). – mason Aug 11 '17 at 18:01
  • @mason You're absolutely correct. However, the example he gives is valid XHMTL. But yeah in general you need a parser that can handle the quirks of HTML. – Adam Schiavone Aug 11 '17 at 18:06

4 Answers4

0

A quick Google search for "C# HTML Parser" yielded at least these two resources:

Please read the documentation, and come back here if you have specific errors you need help dealing with.

Guillaume CR
  • 3,006
  • 1
  • 19
  • 31
0

I'd use the Html Agility pack. Add it through Nuget. Once you have that you can do something like below

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(yourHtml);
HtmlNode myNode = doc.DocumentNode.FindNode(some xpath to select what you want);

now you can extract the text from myNode, children/parentnodes, or any of the attributes such as the styles, etc.

chris-crush-code
  • 1,114
  • 2
  • 8
  • 17
0

Its not actively maintained anymore, but you could try using CsQuery.

I've had some luck with it in the past, doing similar things.

Adam Schiavone
  • 2,412
  • 3
  • 32
  • 65
0

Thank you very much - it was a missunderstanding. In the most examples in the wwww, the used the HtmlDocument-Class. Such a Class is provided in the Lib System.Windows.Forms. The examples used the Lib "HtmlAgilityPack" where the class has the same name.

My problem is solved - thank you all.

Jan021981
  • 521
  • 3
  • 28