You could start with this code using LINQ Select
extension method:
string str = "hello world!";
string a = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
a += a.ToLower();
char[] alphabet = a.ToCharArray();
str = string.Join("",
str.Select(ch => alphabet.Contains(ch) ?
ch.ToString() : String.Format("_x{0:x4}_", ch)).ToArray()
);
Now clearly it has some problems:
- it does linear search in the list of characters
- missed numeric...
- if we add numeric need to decide if first character is ok to be digit (assuming yes)
- code creates large number of strings that are immediately discarded (one per character)
- alphanumeric is limited to ASCII (assuming ok, if not
Char.IsLetterOrDigit
to help)
- does to much work for pure alpha-numeric strings
First two are easy - we can use HashSet
(O(1) Contains
) initialized by full list of characters (if any alpahnumeric characters are ok more readable to use existing method - Char.IsLetterOrDigit
):
public static HashSet<char> asciiAlphaNum = new HashSet<char>
("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789");
To avoid ch.ToString()
that really pointlessly produces strings for immediate GC we need to figure out how to construct string from mix of char
and string
. String.Join
does not work because it wants strings to start with, regular new string(...)
does not have option for mix of char
and string
. So we are left with StringBuilder
that happily takes both to Append
. Consider starting with initial size str.Length
if most strings don't have other characters.
So for each character we just need to either builder.Append(ch)
or builder.AppendFormat(("_x{0:x4}_", (int)ch)
. To perform iteration it is easier to just use regular foreach
, but if one really wants LINQ - Enumerable.Aggregate
is the way to go.
string ReplaceNonAlphaNum(string str)
{
var builder = new StringBuilder();
foreach (var ch in str)
{
if (asciiAlphaNum.Contains(ch))
builder.Append(ch);
else
builder.AppendFormat("_x{0:x4}_", (int)ch);
}
return builder.ToString();
}
string ReplaceNonAlphaNumLinq(string str)
{
return str.Aggregate(new StringBuilder(), (builder, ch) =>
asciiAlphaNum.Contains(ch) ?
builder.Append(ch) : builder.AppendFormat("_x{0:x4}_", (int)ch)
).ToString();
}
To the last point - we don't really need to do anything if there is nothing to convert - so some check like check alphanumeric characters in string in c# would help to avoid extra strings.
Thus final version (LINQ as it is a bit shorter and fancier):
private static asciiAlphaNumRx = new Regex(@"^[a-zA-Z0-9]*$");
public static HashSet<char> asciiAlphaNum = new HashSet<char>
("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789");
string ReplaceNonAlphaNumLinq(string str)
{
return asciiAlphaNumRx.IsMatch(str) ? str :
str.Aggregate(new StringBuilder(), (builder, ch) =>
asciiAlphaNum.Contains(ch) ?
builder.Append(ch) : builder.AppendFormat("_x{0:x4}_", (int)ch)
).ToString();
}
Alternatively whole thing could be done with Regex - see Regex replace: Transform pattern with a custom function for starting point.