0

I have some text like as below

<span style="font-weight: 700;">Aanbod wielen (banden + velgen) </span>
<br><br>
<span style="font-weight: 500;">lichtmetalen originele Volvo set met winterbanden:<br>origineel:</span> Volvo<br>
<b>inch maat:</b> 15''<br>
<p>steek:</p> 5x108mm<br>
<span style="font-weight: 700;">naafgat:</span>

I need to identify that span tag with inline style font-weight and replace with <b> tag and same as closing tag also replace </b> tag in c#. I need that text like as below.

<b>Aanbod wielen (banden + velgen)</b>
<br><br>
<b>lichtmetalen originele Volvo set met winterbanden:<br>origineel:</b> Volvo <br>
<b>inch maat:</b> 15''<br>
<p>steek:</p> 5x108mm<br>
<b>naafgat:</b>

so how can we identify. Please help me in that case.

er-sho
  • 9,581
  • 2
  • 13
  • 26
Urvish Patel
  • 134
  • 15
  • what is `Idetified`? – jazb Oct 26 '18 at 05:51
  • Have you ever heard of Regular expressions. – Tobias Oct 26 '18 at 05:53
  • You need to be clearer about what you want, asking vague requirements will result in generic answers (like my answer below). Any span? spans with specific attributes (like font-weight)? why do you need it? to parse it by another process? to display parts in bold? that sort of information. – Tomer W Oct 26 '18 at 06:01
  • @TomerW span with attribute font-weight should be replace with b – Urvish Patel Oct 26 '18 at 06:07

2 Answers2

4

You can replace your span by b by using HtmlAgilityPack. And it's free and open source.

You can install HtmlAgilityPack from nuget also Install-Package HtmlAgilityPack -Version 1.8.9

public string ReplaceSpanByB()
{
    HtmlDocument doc = new HtmlDocument();

    string htmlContent = File.ReadAllText(@"C:\Users\xxx\source\repos\ConsoleApp4\ConsoleApp4\Files\HTMLPage1.html");

    doc.LoadHtml(htmlContent);

    if (doc.DocumentNode.SelectNodes("//span") != null)
    {
        foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//span"))
        {
            var attributes = node.Attributes;

            foreach (var item in attributes)
            {
                if (item.Name.Equals("style") && item.Value.Contains("font-weight"))
                {
                    HtmlNode b = doc.CreateElement("b");
                    b.InnerHtml = node.InnerHtml;

                    node.ParentNode.ReplaceChild(b, node);
                }
            }
        }
    }

    return doc.DocumentNode.OuterHtml;
}

Output:

enter image description here

er-sho
  • 9,581
  • 2
  • 13
  • 26
1

1st: Dont use Regex, though it is possible and it seems logical to use so,
it is mostly wrong and full of pain.
a happy post about it can be found HERE

2nd:
use an HTML parser such as https://html-agility-pack.net/ to traverse the tree
(you can use xPath to easily find all the span elements you want to replace)
and replace any span elements with a b (don't forget to set the new b element contents)

Side note: As much as i recall, the b tag is discouraged
so if you only need the span text to be Bold...
it is already is because of "font-weight:bold".

On https://developer.mozilla.org/en-US/docs/Web/HTML/Element/b :

Historically, the element was meant to make text boldface. Styling information has been deprecated since HTML4, so the meaning of the element has been changed." and "The HTML Bring Attention To element () is used to draw the reader's attention to the element's contents, which are not otherwise granted special importance." – Thanks @Richardissimo

Tomer W
  • 3,395
  • 2
  • 29
  • 44
  • I've upvoted, but I'm not sure about your side note, can you cite a reference for `b` being discouraged? (Its been in html since the start...) Found it, but it's not the `b` tag being deprecated, it's only the way you use it... https://developer.mozilla.org/en-US/docs/Web/HTML/Element/b – Richardissimo Oct 26 '18 at 06:10
  • Quotes from that page *"Historically, the element was meant to make text boldface. Styling information has been deprecated since HTML4, so the meaning of the element has been changed."* and *"The HTML Bring Attention To element () is used to draw the reader's attention to the element's contents, which are not otherwise granted special importance."* – Richardissimo Oct 26 '18 at 06:21