1

I have a string that has extended ASCII char ÿ in it and I am trying to delete it. How do I find it in a string and delete it from a string like this: 1ÿ1ÿ0ÿÿÿ?

The byte array, buffer = { 49, 0, 255, 255, 49, 0, 255, 255, 48, 0, 255, 255, 255, 255 }

I am using C# and the string was formed from a byte array like so: temp.Add(System.Text.Encoding.ASCII.GetString(buffer));

Then, the first item in temp is "1\0??1\0??0\0????"

I would like to remove the non-ASCII values from the string, or better yet the buffer.

NexAddo
  • 752
  • 1
  • 18
  • 37
  • 2
    "Extended ASCII" isn't a very specific term; various different encodings are called "extended ASCII" by different people. It sounds like you just mean "non-ASCII". Where did this data come from and do you *really* want to just discard it all? What is the byte array - could it be that you're just using the wrong encoding? – Jon Skeet Oct 31 '11 at 14:23
  • Question not clear, you have to remove character 255 (probably that character is 255) or you have to replace it with one other or you need to understand why it happens? – Salvatore Previti Oct 31 '11 at 14:24
  • does http://stackoverflow.com/questions/3698885/how-to-remove-non-ascii-word-from-a-string-in-c-sharp do it? – Aaron Anodide Oct 31 '11 at 14:40

4 Answers4

2

To work with all non-ASCII characters in a string:

Where '?' is your replacement character.

var clean = new string("1ÿ1ÿ0ÿÿÿ".Select(c => c > 127 ? '?' : c).ToArray());

or

var clean = new string("1ÿ1ÿ0ÿÿÿ".Where(c => c <= 127).ToArray());

If you want to remove characters.

Update

In response to your update you can remove non-ascii characters from your buffer to create a string as follows:

string clean = new string(buffer.Where(b => b <= 127).Select(b => (char)b).ToArray());
Tim Lloyd
  • 37,954
  • 10
  • 100
  • 130
1

Anything wrong with replace.

temp.replace("ÿ","");
Salvatore Previti
  • 8,956
  • 31
  • 37
rerun
  • 25,014
  • 6
  • 48
  • 78
1

To remove the characters from the buffer before creating the string:

byte[] buffer = new byte[] { 49, 0, 255, 255, 49, 0, 255, 255, 48, 0, 255, 255, 255, 255 }
var cleanBuffer = buffer.Where((b) => b < 128).ToArray();
string temp = Encoding.ASCII.GetString(cleanBuffer);

If you try to convert it to a string and then remove the offending characters, you can't tell the difference between a legitimate ? character and one that was placed there because conversion failed. That is, if your buffer contained:

{ 63, 63, 49, 0, 255, 255, 49, 0, 255, 255, 48, 0, 255, 255, 255, 255 }

Then the resulting string would start with ??1\0??. The first two question marks are legitimate, but the last two are the result of conversion failure.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
0

Loop over the entire string and add only those characters that are ASCII.

    ' http://stackoverflow.com/questions/123336/how-can-you-strip-non-ascii-characters-from-a-string-in-c
    Public Shared Function GetAsciiString(ByVal strInputString As String) As String
        Dim strASCII As String = System.Text.Encoding.ASCII.GetString( _
                                                                        System.Text.Encoding.Convert(System.Text.Encoding.UTF8, _
                                                                                                        System.Text.Encoding.GetEncoding(System.Text.Encoding.ASCII.EncodingName, _
                                                                                                        New System.Text.EncoderReplacementFallback(String.Empty), _
                                                                                                        New System.Text.DecoderExceptionFallback()), _
                                                                                                        System.Text.Encoding.UTF8.GetBytes(strInputString) _
                                                                                                    ) _
                                                                    )

        Return strASCII
    End Function


    Public Shared Function IsAscii(ByVal strInputString As String) As Boolean
        'Dim strInputString As String = "Räksmörgås"
        If (GetAsciiString(strInputString) = strInputString) Then
            Return True
        End If

        Return False
    End Function

Edit: Here C#:

// http://stackoverflow.com/questions/123336/how-can-you-strip-non-ascii-characters-from-a-string-in-c
public static string GetAsciiString(string strInputString)
{
    string strASCII = System.Text.Encoding.ASCII.GetString(System.Text.Encoding.Convert(System.Text.Encoding.UTF8, System.Text.Encoding.GetEncoding(System.Text.Encoding.ASCII.EncodingName, new System.Text.EncoderReplacementFallback(string.Empty), new System.Text.DecoderExceptionFallback()), System.Text.Encoding.UTF8.GetBytes(strInputString)));

    return strASCII;
}


public static bool IsAscii(string strInputString)
{
    //Dim strInputString As String = "Räksmörgås"
    if ((GetAsciiString(strInputString) == strInputString)) {
        return true;
    }

    return false;
}
Stefan Steiger
  • 78,642
  • 66
  • 377
  • 442
  • Have you copied that code from what looks like a duplicate question? http://stackoverflow.com/questions/123336/how-can-you-strip-non-ascii-characters-from-a-string-in-c – Tim Lloyd Oct 31 '11 at 14:37
  • @chibacity: I copied it from code I wrote 5 years ago. But I have published it on the web, so the code you referenced probably originated from me, and not vice-versa. – Stefan Steiger Oct 31 '11 at 14:42
  • I was just wandering why you had the SO question link. Probably worth just marking the question for closing as a duplicate than copying the code to a duplicate question? – Tim Lloyd Oct 31 '11 at 14:45