0

I am developing a web application by using ASP.NET/ C# and AZURE. I am using Azure Blob to store files. I face a problem to store a file with other languages(Only English is OK).

Example: I save this as a .txt file :"한나라당 전당대회 돈 봉투 사건을 수사하고 있는 검찰이 박희태 국회의장 비서관 사무실을 전격 압수수색했습니다. 조정만, 이봉건 두 수석 비서관실과 여비서 함모 씨가 근무하는 부속실입니다. 서울중앙지검 공안1부는 오늘(19일) 아침 8시 20분 서울 여의도"

but when I retrive this, its shows: "한나ë¼ë‹¹ ì „ë‹¹ëŒ€íšŒ ëˆ ë´‰íˆ¬ ì‚¬ê±´ì„ ìˆ˜ì‚¬í•˜ê³ ìžˆëŠ” ê²€ì°°ì´ ë°•í¬íƒœ 국회ì˜ìž¥ 비서관 ì‚¬ë¬´ì‹¤ì„ ì „ê²© 압수수색했습니다. ì¡°ì •ë§Œ, ì´ë´‰ê±´ ë‘ ìˆ˜ì„ ë¹„ì„œê´€ì‹¤ê³¼ 여비서 함모 씨가 근무하는 ë¶€ì†ì‹¤ìž…니다. 서울중앙지검 공안1부는 오늘(19ì¼) 아침 8시 20ë¶„ 서울 ì—¬ì˜ë„"

What is the problem ?

Thanks
Nahid

Md Nasir Uddin
  • 2,130
  • 8
  • 38
  • 59
  • 1
    This is nothing to do with Azure, this is a text encoding issue. – Chandermani Jan 23 '12 at 08:33
  • 1
    @Chandermani is correct I think. Could you try to specify UTF-8 or UTF-16 when you read the file back? On disk files have a BOM that tells the reader what encoding the file has, but I suspect this is lost or ignored when you download the file from the blob. –  Jan 23 '12 at 08:54

1 Answers1

2

You have to save your text files in UTF format (not ASCII).

UPDATE after @naruse comment

And you have to specify the content type property for the blob including the charset. I do that for cyrillic alphabet and it works perfectly. There shall not be issues with Korean one.

If it is a plain text file, the proper value for Content Type should be:

text/plain; charset=utf-8

Or the charset you naturally use.

astaykov
  • 30,768
  • 3
  • 70
  • 86
  • I use this. but its not working: 1. blob.Properties.ContentType = Encoding.UTF8.HeaderName; 2. blob.Properties.ContentType = " charset=utf-8"; 3. blob.Properties.ContentLanguage = Encoding.UTF8.HeaderName; But Its not working.. – Md Nasir Uddin Jan 23 '12 at 12:58
  • 1
    how about the file itself. Not only the blob properties. Did you try saving the file with some "Advanced" text editor such as NodePad++ or UltraEdit, or anything that support UTF8 encoding? So you could explicitly set Unicode encoding for the file itself. Try with and witout BOM (Byte Order Mask). – astaykov Jan 23 '12 at 14:02
  • 1
    Content-Encoding is not for such usage; its value is usually gzip or deflate. It should be saved as Content-Type's charset parameter – naruse Oct 09 '14 at 07:45
  • @naruse, it appears that the problem has been solved by simply converting the files to UTF-8, but I wonder what brings you to this question and answer, which are more than 2 years old? – astaykov Oct 09 '14 at 10:36