3

I am trying to convert the plain text Arabic Numerals into Eastern Arabic digits. So basically taking 1 2 3... and converting them into ١‎ ٢‎ ٣‎.... The function converts all numbers, including any numbers contained within tags, i.e. H1.

 private void LoadHtmlFile(object sender, EventArgs e)
        {
            var htmlfile = "<html><body><h1>i was born in 1988</h1></body></html>".ToArabicNumber(); ;
            webBrowser1.DocumentText=htmlfile;
        }


    }
    public static class StringHelper
    {
        public static string ToArabicNumber(this string str)
        {
            if (string.IsNullOrEmpty(str)) return "";
            char[] chars;
            chars = str.ToCharArray();
            for (int i = 0; i < str.Length; i++)
            {
                if (str[i] >= '0' && str[i] <= '9')
                {
                    chars[i] += (char)1728;
                }
            }
            return new string(chars);
        }
    }

I also tried targeting only numbers in InnerText, but it also did not work. The code below changes tag numbers as well.

private void LoadHtmlFile(object sender, EventArgs e)
        {
            var htmlfile = "<html><body><h1>i was born in 1988</h1></body></html>" ;
            webBrowser1.DocumentText=htmlfile;
        }

        private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            webBrowser1.Document.Body.InnerText = webBrowser1.Document.Body.InnerText.ToArabicNumber();
        }

Any suggestions?

amadib
  • 868
  • 14
  • 33
KF2
  • 9,887
  • 8
  • 44
  • 77

4 Answers4

2

You can use a regular expression to find the parts of the HTML that are between '>' and '<' characters, and operate on those. This will prevent the code from processing the tag names and attributes (style, etc).

// Convert all English digits in a string to Arabic digit equivalents
public static string ToArabicNums(string src)
{
    const string digits = "۰۱۲۳۴۵۶۷۸۹";
    return string.Join("", 
        src.Select(c => c >= '0' && c <= '9' ? digits[((int)c - (int)'0')] : c)
    );
}

// Convert all English digits in the text segments of an HTML 
// document to Arabic digit equivalents
public static string ToArabicNumsHtml(string src)
{
    string res = src;

    Regex re = new Regex(@">(.*?)<");

    // get Regex matches 
    MatchCollection matches = re.Matches(res);

    // process in reverse in case transformation function returns 
    // a string of a different length
    for (int i = matches.Count - 1; i >= 0; --i)
    {
        Match nxt = matches[i];
        if (nxt.Groups.Count == 2 && nxt.Groups[1].Length > 0)
        {
            Group g = nxt.Groups[1];
            res = res.Substring(0, g.Index) + ToArabicNums(g.Value) +
                res.Substring(g.Index + g.Length);
    }

    return res;
}

This isn't perfect, since it doesn't check at all for HTML character specifiers outside of the tags, such as the construct &#<digits>; (&#1777; for ۱, etc)to specify a character by Unicode value, and will replace the digits in these. It also won't process any extra text before the first tag or after the last tag.

Sample:

Calling: ToArabicNumsHtml("<html><body><h1>I was born in 1988</h1></body></html>")
Result: "<html><body><h1>I was born in ۱۹۸۸</h1></body></html>"

Use whatever code you prefer in ToArabicNums to do the actual transformation, or generalize it by passing in a transformation function.

Corey
  • 15,524
  • 2
  • 35
  • 68
0

Use regular expressions. Here is the JavaScript code I myself use:

function toIndic(n) {
    var ns = ['۰', '۱', '۲', '۳', '۴', '۵', '۶', '۷', '۸', '۹'];

    return n.toString().replace(/\d/g, function (m) { 
        return ns[m];
    });
}

To make sure, you only convert numbers, you can use a better regular expression: \b[0-9]+\b

alijsh
  • 842
  • 7
  • 7
  • This is also change `1` in tag `H1` – KF2 Feb 14 '13 at 05:59
  • 1
    Parsing HTML with Regular expression is semi-acceptable if you know exact structure of HTML and need to pick very limited, well defined elements. it looks like @irsog wants generic code - regex is bad idea for that - http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Alexei Levenkov Feb 14 '13 at 06:36
0

This function can convert English to Persian , Arabic and ordu

function convertDigitIn(enDigit){ // PERSIAN, ARABIC, URDO
    var newValue="";
    for (var i=0;i<enDigit.length;i++)
    {
        var ch=enDigit.charCodeAt(i);
        if (ch>=48 && ch<=57
        {
            // european digit range
            var newChar=ch+1584;
            newValue=newValue+String.fromCharCode(newChar);
        }
        else
            newValue=newValue+String.fromCharCode(ch);
    }
    return newValue;
}
  • +0: you have correct function, but unfortunately if you read the question it is "how to find all text values in HTML and apply known transformation", has nothing to do with title... – Alexei Levenkov Feb 14 '13 at 06:34
0

Just add this at the end of your document, it will works fine :-)

<script type="text/javascript">
    $(document).ready(function() {
        var map = ["&\#1632;","&\#1633;","&\#1634;","&\#1635;","&\#1636;","&\#1637;","&\#1638;","&\#1639;","&\#1640;","&\#1641;"]

        document.body.innerHTML = document.body.innerHTML.replace(
            /\d(?=[^<>]*(<|$))/g,
            function($0) { return map[$0] }
        );
    });
</script>
Bashir Noori
  • 639
  • 7
  • 12