2

I've got a string value with a lot of different characters and I want to get a string contains with permitted characters.
For Example: I've got this string "geeks01$سهیلاطریقی03.02geeks!@!!." but I want to return this value:"0103.سهیلاطریقی02@."
The following Class is for detecting valid characters. and it works correctly .but I can't find an expressionfor persian characters.
Does anyone have any idea for fixing this problem? or any solution for better performance because I care about bottleneck and it must run about 8,000,000 times :)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;

namespace ConsoleApp1
{
    class Program
    {
        public static void Main()
        {
            string str = "geeks01$سهیلاطریقی03.02geeks!@!!.";
            splitString(str, true, false, true, new char[] { '@', '.' });
        }
        static string splitString(string str, bool keepNumber, bool keepEnglishAlpha, bool keepPersianbAlpha, char[] special)
        {
            StringBuilder value =
                     new StringBuilder();
            for (int i = 0; i < str.Length; i++)
            {
                if (Char.IsDigit(str[i]) && keepNumber == true)
                    value.Append(str[i]);

                if (keepEnglishAlpha == true)
                    if ((str[i] >= 'A' && str[i] <= 'Z') || (str[i] >= 'a' && str[i] <= 'z'))
                        value.Append(str[i]);

                if (keepPersianbAlpha == true)
                {
                    //todo
                }
                if (special.Length >= 1)
                {
                    foreach (var specialChar in special)
                    {
                        if (str[i] == specialChar)
                            value.Append(str[i]);
                    }
                }

            }
            return value.ToString();
        }
    }
}
Soheila Tarighi
  • 487
  • 4
  • 15
  • Maybe this can help: https://stackoverflow.com/questions/10561590/regex-for-check-the-input-string-is-just-in-persian-language – i486 Jul 05 '20 at 10:59
  • Thank you but I want to use c# not Java Script. @i486 – Soheila Tarighi Jul 05 '20 at 11:00
  • This regex has error in `\s` @viveknuna **Error CS1009 Unrecognized escape sequence** – Soheila Tarighi Jul 05 '20 at 11:05
  • 1
    @SoheilaTarighi try this `Regex.IsMatch(Text, @"^[\u0600-\u06ff\s]+$|[\u0750-\u077f\s]+$|[\ufb50-\ufc3f\s]+$|[\ufe70-\ufefc\s]+$|[\u06cc\s]+$|[\u067e\s]+$|[\u06af\s]$|[\u0691\s]+$|^$");` – Vivek Nuna Jul 05 '20 at 11:06
  • could you share some other examples. – osman Rahimi Jul 05 '20 at 11:09
  • Is this working as expected? Your `splitString` method invoking but you didn't set the result to any variable. I mean firstly, you have to set to `str` again then you can decide how to use it. `str = splitString(str,...` – gurkan Jul 05 '20 at 11:35

1 Answers1

3

You can use the Enumerable.Aggregate method to improve the function and return the desired output based on the specified conditions.

using System;
using System.Linq;
using System.Globalization;
//...

//Suggested rename...
static string FilterString(
    string str, 
    bool keepNumber, 
    bool keepEnglishAlpha, 
    bool keepPersianbAlpha, 
    char[] special
    ) =>
    str.Aggregate(new StringBuilder(), (sb, c) =>
        (keepNumber && char.IsDigit(c))
        || (keepEnglishAlpha && ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')))
        || (keepPersianbAlpha && char.GetUnicodeCategory(c) == UnicodeCategory.OtherLetter)
        || (special != null && special.Contains(c))
        ? sb.Append(c) : sb.Append(string.Empty)).ToString();

Testing the mentioned above string:

public static void Main()
{
    var input = "geeks01$سهیلاطریقی03.02geeks!@!!.";
    var output = FilterString(input, true, false, true, new[] { '@', '.' });

    Console.WriteLine(output);
}

Writes:

01سهيلاطريقي03.02@.
  • 1
    Than you ,your solution works correctly, How I can replace ی ک arabic with ی ک persian ? @JQSOFT – Soheila Tarighi Jul 05 '20 at 16:00
  • 1
    @SoheilaTarighi Most welcome Ma`am. Could you please give me a string with these characters to check it out? –  Jul 05 '20 at 16:14
  • 1
    For Example :`علي` Convert To `علی` and `شکاك` convert to `شکاک` how to detect arabic `ي ك` from input variable and convert to persian `ی ک` For Example : **"geeks01$سهيلاطريقي03.02geeks!@!!.";** and I want to return **01سهیلاطریقی03.02@.** remove arabic `ي` and replace with persian`ی` – Soheila Tarighi Jul 05 '20 at 16:34
  • 1
    @SoheilaTarighi Use: `var s = "شکاك".Replace("\u0643", "\u06A9");` and `var s = "علي".Replace("\u064A", "\u0649");`. Check the links out for more: [Arabic Alphabet](https://en.wikipedia.org/wiki/Arabic_script_in_Unicode) and [Persian Alphabet](https://en.wikipedia.org/wiki/Persian_alphabet). –  Jul 05 '20 at 17:07