1

If a HTML will be sent via email, an alternative plain text has to be attached as well. (At least some spam detection software will check for a plain-text alternative) How am I able to convert a HTML to plain text?

        HtmlDocument document = new HtmlDocument();
        document.Load(htmlBody);
        string plainBody = document.DocumentNode.InnerText;

Will return plain text, but all links will be lost.

E.g.:

HTML Version

<a href="#">Hello World</a>

should result in

Hello World (#)

But it results in

Hello World
Bin4ry
  • 652
  • 9
  • 34
  • Does this answer your question? [How do I remove all HTML tags from a string without knowing which tags are in it?](https://stackoverflow.com/questions/18153998/how-do-i-remove-all-html-tags-from-a-string-without-knowing-which-tags-are-in-it) – DCCoder Sep 14 '20 at 00:12
  • @DCCoder no, as I am asking for a lib to do it for me. – Bin4ry Sep 14 '20 at 15:48
  • @Blastfurnace yes. This is exactly what I have posted below, in order to answer my own question. – Bin4ry Sep 14 '20 at 16:48
  • 1
    I'm aware of that, it's why I flagged this as a duplicate question... – Blastfurnace Sep 14 '20 at 18:04

1 Answers1

2

As far as I know, the innertext will get the text between the start and end tags of the object, it will not get the attribute value.

If you want to get the attribute value ,you should do it by yourself. You could select all the a tag's href attribute value and then replace the innertext.

More details, you could refer to below codes:

I used HtmlAgilityPack package, you could install it by using Nugetpackage: https://www.nuget.org/packages/HtmlAgilityPack/

        var doc = new HtmlDocument();
        doc.LoadHtml(@"<html><body><div id='foo'>text<a href='#'>Hello World</a> <a href='#'>test</a></div></body></html>");

        var innertext = doc.DocumentNode.InnerText;

        var nodes =  doc.DocumentNode.SelectNodes("//a");

        foreach (var item in nodes)
        {
            var herf = ((HtmlAttribute)item.Attributes.Where(x => x.Name == "href").FirstOrDefault()).Value;

            innertext = innertext.Replace(item.InnerText, item.InnerText + string.Format("({0})", herf));
            
        }

Result:

enter image description here

Brando Zhang
  • 22,586
  • 6
  • 37
  • 65
  • Thanks for your time and effort. However, I am looking for an "out of the box" solution. Reinventing the wheel isn't what I was looking for. Please check out my suggested solution above. – Bin4ry Sep 14 '20 at 15:50