0

Suppose i have a bangla string like "সাজানো". I need to split that into "সা" "জা" "নো". I have tried the ToCharArray() method but yes it splits the following string into 'স','া','জ','া','ন', 'ো'.

So the problem is that i want to split a string to another string array with combined/dependent characters together.Like "সা" should be separated from "সাজানো", not in individual char like 'স'and 'া'.

ssakash
  • 640
  • 8
  • 23
  • You cannot split Indian languages like that. Check this SO post [Converting Unicode string to unicode chars in c# for indian languages](http://stackoverflow.com/questions/13966487/converting-unicode-string-to-unicode-chars-in-c-sharp-for-indian-languages) for a solution – bansi Sep 15 '14 at 05:27

2 Answers2

0

try this:

string[] SplitString(string S)
{
    return S.Split('া');
}

private void button1_Click(object sender, EventArgs e)
{
    string B = "সাজানো";
    var vv = SplitString(B);
    foreach (var item in vv)
    {
        MessageBox.Show(item.ToString());
    }
}
Nic
  • 12,220
  • 20
  • 77
  • 105
Mehdi Khademloo
  • 2,754
  • 2
  • 20
  • 40
0

This works too:

string text = "সাজানো";
var textCharArray = text.ToCharArray();
var tokens = new List<string>();

for (int i = 0; i < textCharArray.Length; i++)
{
    char c = textCharArray[i];
    if (char.GetUnicodeCategory(c) == System.Globalization.UnicodeCategory.SpacingCombiningMark)
    {
        string token = $"{tokens.Last()}{c}";
        tokens.RemoveAt(tokens.Count() - 1);
        tokens.Add(token);
    }
    else
    {
        tokens.Add($"{c}");
    }
}

foreach (string token in tokens) 
    Console.WriteLine(token);

The code is kind of sloppy. Its late.

You can also test for Surrogates as well if you're working with those.

Nic
  • 12,220
  • 20
  • 77
  • 105