3

I would like to implement a functionality that insert a word-breaking TAG if a word is too long to appear in a single line.

    protected string InstertWBRTags(string text, int interval)
{
    if (String.IsNullOrEmpty(text) || interval < 1 || text.Length < interval)
    {
        return text;
    }
    int pS = 0, pE = 0, tLength = text.Length;
    StringBuilder sb = new StringBuilder(tLength * 2);

    while (pS < tLength)
    {
        pE = pS + interval;
        if (pE > tLength)
            sb.Append(text.Substring(pS));
        else
        {
            sb.Append(text.Substring(pS, pE - pS));
            sb.Append("&#8203;");//<wbr> not supported by IE 8
        }
        pS = pE;
    }
    return sb.ToString();
}

The problem is: What can I do, if the text contains html-encoded special chars? What can I do to prevent insertion of a TAG inside a &szlig;? What can I do to count the real string length (that appears in browser)? A string like &#9825;&#9829;♡♥ contains only 2 chars (hearts) in browser but its length is 14.

Rocco Hundertmark
  • 545
  • 2
  • 7
  • 20

2 Answers2

1

One solution would be to decode the entities into the Unicode characters they represent and work with that. To do that use System.Net.WebUtility.HtmlDecode() if you're in .NET 4 or System.Web.HttpUtility.HtmlDecode() otherwise.

But be aware that not all Unicode character fit in one char.

Community
  • 1
  • 1
svick
  • 236,525
  • 50
  • 385
  • 514
  • The `HtmlEncode` and `HtmlDecode` methods aren't symmetrical; decoding will convert the entities into single characters, but encoding won't convert all of these characters back into entities. Also, if the source text contains characters such as `<` and entities such as `<`, then there's no way of distinguishing those after decoding. – Niels van der Rest Jul 21 '10 at 14:31
  • I meant that he shouldn't use `HtmlDecode` at all. But that would require the output to be Unicode. – svick Jul 21 '10 at 15:16
0

You need to pass through whole text character by character, when you find a & than you examine what is next, if you reach a # it is quite sure that after this till a column will be a set of number (you can check it also). I such situation you move your iterator to the position of nearest semicolon and increment the counter.

In Java dialect

int count = 0;

        for(int i = 0; i < text.length(); i++) {

            if(text.charAt(i) == '&') {
                i  = text.indexOf(';', i) + 1; // what, from
            }

            count++;

        }

Very simplified version