0

I am extracting a zip file using system.io.compression in .net 4.5

var path = @"<zipfilePath>";
 using (ZipArchive zarc = ZipFile.Open(path, ZipArchiveMode.Read,Encoding.UTF8))
 {
            var file= zarc.Entries.First().FullName;
 }

電子メール・テンプレート___第_6_世代インテル®_コア™_ヴィープロ™プロセッサー(カスタマイズ可能

this is my file in the zip. the filename after extracting becomes

pchir�gzxsii___u_6_wtyzgrr_nwT_axixtT_xtvdpi_(ftzeyulb

I know its a encoding issue.But I am not sure which encoding to use here. Additionally I would like to know if a zip contains filename like chinese,korean etc . how to handle encoding for each so that after extracting those I get the exact file name. Thanks in advance.

Debashish Saha
  • 318
  • 1
  • 12
  • sure that the original file name is valid UTF-8? something about "Intel® CORE™ VIPRO™" but the closing bracket is missing? maybe its just truncated and therefore decodes into garbage. – Cee McSharpface Nov 26 '18 at 21:38
  • @dlatikay I have used https://stackoverflow.com/questions/3825390/effective-way-to-find-any-files-encoding answer to check the encoding of the above file.its us-ascii ,code page 1252. The same is extracted correctly with 7zip. – Debashish Saha Nov 26 '18 at 21:43
  • that would be the content encoding. I read your question that the encoding of the file's *name* as stored in the ZIP file directory ends up garbled after extracting. could you show the code used to actually extract the file? – Cee McSharpface Nov 26 '18 at 21:46
  • @dlatikay I am using ExtractToFile() this extension method to extract. – Debashish Saha Nov 26 '18 at 21:50
  • but the `FullName` as read in your code example is still correct? then I'd suggest to use the stream from the ZipFileInfo and just `File.WriteAllBytes` yourself. – Cee McSharpface Nov 26 '18 at 22:05
  • @dlatikay no. the full name itself has incorrect value in it.Thats why while extracting I am getting incorrect file name – Debashish Saha Nov 26 '18 at 22:13
  • But, have tried to extract a file without specifying an encoding? It's not usually necessary. – Jimi Nov 26 '18 at 22:49
  • @Jimi yes . I have tried without explicitly mentioning encoding as well as trying most of the encoding format.nothing worked for me. – Debashish Saha Nov 26 '18 at 22:56
  • Is it just the File name that is problematic or the file content, too? The text you're showing is Japanese. You could save the file name text with an Encoding from CodePage `51932` Japanese (EUC). The file content is OK? – Jimi Nov 26 '18 at 23:04
  • I mean, with `Encoding encoding = Encoding.GetEncoding(51932);`, if you're using the full version of the .Net Framework – Jimi Nov 26 '18 at 23:11
  • I have tried that too.its not working.File name as well as content are incorrect. – Debashish Saha Nov 26 '18 at 23:22
  • Well, you should try to open the archive without specifying any encoding. Then, if the **File name** gets garbled, try to read the File Name string with the Japanese (EUC) encoding, then save it (to disk) as UTF8 (the file name, not the file content. The content it's all another matter which is not related to compression/decompression). – Jimi Nov 26 '18 at 23:34
  • @Jimi What if I have multiple of these kind of files each of different encoding and I need to extract each of them with correct name. – Debashish Saha Nov 26 '18 at 23:37
  • Well, that's a good question. I don't think you can read, with the current implementation, the entry-names encoding directly. Through reflection, you can get to it. It's a non-public property in the ZipArchive: `EntryNameEncoding`. It's usually `null`. The internal encoding defaults to UTF8. You should probably use a different utility/library if you need extended support for encoding and other internal informations. ZipFile is bare-bones. – Jimi Nov 26 '18 at 23:58
  • See: [SharpZipLib - GitHub](https://github.com/icsharpcode/SharpZipLib/blob/master/README.md) and [SevenZipSharp - GitHub](https://github.com/squid-box/SevenZipSharp) – Jimi Nov 27 '18 at 00:03

0 Answers0