If the expected input values are something like this: 65 66 67 97 98 99
, you could just split the input and cast the converted int
values to char
:
string vWord = "65 66 67 97 98 99";
string result = string.Join("", vWord.Split().Select(n => (char)(int.Parse(n))));
Console.WriteLine($"Result string: {result}");
This method, however, doesn't perform any error checking on the input string. When dealing with user input, this is not a great idea. We better use int.TryParse() to validate the input parts:
var result = new StringBuilder();
var ASCIIValues = vWord.Split();
foreach (string CharValue in ASCIIValues) {
if (int.TryParse(CharValue, out int n) && n < 127) {
result.Append((char)n);
}
else {
Console.WriteLine($"{CharValue} is not a vaid input");
break;
}
}
Console.WriteLine($"Result string: {result.ToString()}");
You could also use the Encoding.ASCII.GetString method to convert to string the Byte array generated by the byte.Parse method. For example, using LINQ's Select
:
string vWord = "65 66 67 97 98 267";
try
{
var CharArray = vWord.Split().Select(n => byte.Parse(n)).ToArray();
string result = Encoding.ASCII.GetString(CharArray);
Console.WriteLine($"String result: {result}");
}
catch (Exception)
{
Console.WriteLine("Not a vaid input");
}
This will print "Not a vaid input"
, because one of the value is > 255
.
Should you decide to allow an input string composed of contiguous values:
651016667979899112101 => "AeBCabcpe"
You could adopt this variation:
string vWord2 = "11065666797989911210110177";
int step = 2;
var result2 = new StringBuilder();
for (int i = 0; i < vWord2.Length; i += step)
{
if (int.TryParse(vWord2.Substring(i, step), out int n) && n < 127)
{
if (n <= 12 & i == 0) {
i = -3; step = 3; ;
}
else if(n <= 12 & i >= 2) {
step = 3; i -= step;
}
else {
result2.Append((char)n);
if (step == 3) ++i;
step = 2;
}
}
else {
Console.WriteLine($"{vWord2.Substring(i, step)} is not a vaid input");
break;
}
}
Console.WriteLine($"Result string: {result2.ToString()}");
Result string: nABCabcpeeM
As Tom Blodget requested, a note about the automatic conversion
between ASCII characters-set and Unicode CodePoints.
This code produces some ASCII characters using an integer value, corresponding to the character in the ASCII table, casting the value to a char type and converting the result to a Windows standard Unicode (UTF-16LE
) string.
Why there's no need to explicitly convert the ASCII chars to their Unicode representation?
Because, for historical reasons, the lower Unicode CodePoints directly map to the standard ASCII table (the US-ASCII table).
Hence, no conversion is required, or it can be considered implicit.
But, since the .Net string type uses UTF-16LE
Unicode internally (which uses a 16-bit unit for each character in the lower Plane, two 16-bit code units for CodePoints greater or equal to 216), the memory allocation in bytes for the string is double the number of characters.
In the .Net Reference Source, StringBuilder.ToString() will call the internal wstrcpy
method:
wstrcpy(char *dmem, char *smem, int charCount)
which will then call Buffer.Memcpy
:
Buffer.Memcpy((byte*)dmem, (byte*)smem, charCount * 2);
where the size in bytes is set to charCount * 2
.
Since the first draft, in the '80s (when the first Universal Character Set (UCS) was developed), one of the primary objectives of the IEEE and the Unicode Consortium (the two main entities that were developing the standard) was to preserve the compatibility with the pre-existing 256 character-set widely used at the time.
Preserving the CodePoints definition, thus preserving compatibility over time, is a strict rule in the Unicode world. This concept and rules apply to all modern variable length Unicode encodings (UTF-8, UTF-16, UTF-16LE, UTF-32 etc.) and to all CodePoints in the Basic Multilingual Plane (CodePoints in the ranges U+0000 to U+D7FF
and U+E000 to U+FFFF
).
On the other hand, there's no explicit guarantee that the same Local CodePage encoding (often referred to as ANSI Encoding) will produce the same result in two machines, even when the same System (and System version) is in use.
Some other notes about Localization and the Unicode Common Locale Data Repository (CLDR)