0

I have a string which contains unicode data.

I want to write it in a file . When the data is written in file it gives me simple unicode value instead of languages other than english.

string originalString = ((char)(buffer[index])).ToString();
//sb.Append(DecodeEncodedNonAsciiCharacters(originalString.ToString()));
foreach (char c1 in originalString)
{
    // test if char is ascii, otherwise convert to Unicode Code Point
    int cint = Convert.ToInt32(c1);
    if (cint <= 127 && cint >= 0)
        asAscii.Append(c1.ToString());
    else
    {
        //String s = Char.ConvertFromUtf32(cint);
        asAscii.Append(String.Format("\\u{0:x4} ", cint).Trim());
       // asAscii.Append(s);
    }
}

sb.Append((asAscii));
Console.WriteLine();

when i see the output file the data shows like this

1 00:00:27,709-->00:00:32,959 1.2 \u00e0\u00a4\u0085\u00e0\u00a4\u00b0\u00e0\u00a4\u00ac \u00e0\u00a4\u00b2\u00e0\u00a5\u008b\u00e0\u00a4\u0097 28 \u00e0\u00a4\u00b0\u00e0\u00a4\u00be\u00e0\u00a4\u009c\u00e0\u00a5\u008d\u00e0\u00a4\u00af \u00e0\u00a4\u0094\u00e0\u00a4\u00b0 \u00e0\u00a4\u00b8\u00e0\u00a4\u00be\u00e0\u00a4\u00a4 \u00e0\u00a4\u0095\u00e0\u00a5\u0087\u00e0\u00a4\u0082\u00e0\u00a4\u00a6\u00e0\u00a5\u008d\u00e0\u00a4\u00b0 \u00e0\u00a4\u00b6\u00e0\u00a4\u00be\u00e0\u00a4\u00b8\u00e0\u00a4\u00bf\u00e0\u00a4\u00a4 \u00e0\u00a4\u00aa\u00e0\u00a5\u008d\u00e0\u00a4\u00b0\u00e0\u00a4\u00a6\u00e0\u00a5\u0087\u00e0\u00a4\u00b6

but it should look like this

1 00:00:27,400 --> 00:00:32,760 1.2 अरब लोग 28 राज्य और सात केंद्र शासित प्रदेश

I have tried many things but none has done my job.

Sayse
  • 42,633
  • 14
  • 77
  • 146
AnkushSeth
  • 55
  • 2
  • 11
  • [MSDN: How to: Convert Between Hexadecimal Strings and Numeric Types](https://msdn.microsoft.com/en-us/library/bb311038.aspx), You should show what you have tried. – Sayse May 18 '15 at 06:37
  • Unicode Is a proper encoding to strings. just saying... – Zohar Peled May 18 '15 at 06:43
  • @PradnyaBolli: Linking to google is considered 'not constructive'. – Patrick Hofman May 18 '15 at 06:43
  • The code that read the string is wrong and must be fixed. The read code used the wrong encoding. The default encoding for streams is ASCII and use must specify UNICODE encoding in this case. – jdweng May 18 '15 at 06:48
  • None of the tricks worked for me. but i highly appreciate your quick response.I am still looking for the solution – AnkushSeth May 18 '15 at 09:22

1 Answers1

0
string unicodeString = "This string contains the unicode character Pi(\u03a0)";

     // Create two different encodings.
     Encoding ascii = Encoding.ASCII;
     Encoding unicode = Encoding.Unicode;

     // Convert the string into a byte[].
     byte[] unicodeBytes = unicode.GetBytes(unicodeString);

     // Perform the conversion from one encoding to the other.
     byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);

     // Convert the new byte[] into a char[] and then into a string.
     // This is a slightly different approach to converting to illustrate
     // the use of GetCharCount/GetChars.
     char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
     ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
     string asciiString = new string(asciiChars);

     // Display the strings created before and after the conversion.
     Console.WriteLine("Original string: {0}", unicodeString);
     Console.WriteLine("Ascii converted string: {0}", asciiString);
Jerin
  • 3,657
  • 3
  • 20
  • 45
  • Thanks for the reply but i have tried this and the result is..1 00:00:27,709-->00:00:32,959 1.2 िरब लॿि 28 राि्य िर सात िॿिद्र शासित प्रदॿश – AnkushSeth May 18 '15 at 09:15
  • but it should be like this 1 00:00:27,400 --> 00:00:32,760 1.2 अरब लोग 28 राज्य और सात केंद्र शासित प्रदेश – AnkushSeth May 18 '15 at 09:19
  • Its for UTF8 decoding you might need to apply it for the decoding that you are using to encode it to unicode – Jerin May 18 '15 at 10:34
  • hello all, I am stuck with this problem. when i save the buffer as it is then the data is shown properly,but i need to extract data from buffer and save it into string. I think possible loss is of data while converting byte array into string with proper encoding – AnkushSeth May 18 '15 at 13:14
  • Have you checked out these two links https://msdn.microsoft.com/en-us/goglobal/bb688114.aspx http://stackoverflow.com/questions/1922199/c-sharp-convert-string-from-utf-8-to-iso-8859-1-latin1-h After finding the encoding value you can decode it – Jerin May 19 '15 at 05:21
  • 1
    Thank you very much .I got the solution by replacing one line.... `Encoding ascii = Encoding.ASCII;` to `Encoding utf = Encoding.UTF8;` – AnkushSeth May 20 '15 at 06:27
  • Great glad to hear that :) – Jerin May 20 '15 at 06:28