0

My program performs well in test1 to test3 txt. However, it occurs some error when reading 你好.txt. I would like to ask how I can modify my program to fix this problem.

Here is the folder structure. A note folder contains two zip folders which are zipFile1 and zipFile2. I appreciate if anyone can answer my question.

Operating System: Window 10

Error message: Exception in thread "main" java.util.zip.ZipException: invalid CEN header (bad entry name)

Version of java: openjdk version "1.8.0_191-1-ojdkbuild

├── note
    ├── zipFile1.zip
            ├── test1.txt
            ├── test2.txt

    ├── zipFile2.zip
            ├── test3.txt
            ├── 你好.txt

Here is my program.

public class test {

    
    private static final String SOURCE_FOLDER = "note_folder_path";
    
    static File folder = new File(SOURCE_FOLDER);
    static File[] files = folder.listFiles();
    
    final static Charset CHINESE_CHARSET = Charset.forName("MS950");

    public static void main(String[] args) throws IOException
    {
        
        for (File file:files) {
            extractFolder(file.getAbsolutePath());
        }
        
    }
    
    public static void extractFolder(String zipFile) throws IOException {
        int buffer = 2048;
        File file = new File(zipFile);

        try (ZipFile zip = new ZipFile(file,CHINESE_CHARSET))
        
        {
          String newPath = zipFile.substring(0, zipFile.length() - 4);

          new File(newPath).mkdir();
          Enumeration<? extends ZipEntry> zipFileEntries = zip.entries();

          // Process each entry
          while (zipFileEntries.hasMoreElements()) {
            // grab a zip file entry
            ZipEntry entry = zipFileEntries.nextElement();
            String currentEntry = entry.getName();
            File destFile = new File(newPath, currentEntry);
            File destinationParent = destFile.getParentFile();

            // create the parent directory structure if needed
            destinationParent.mkdirs();

            if (!entry.isDirectory()) {
              BufferedInputStream is = new BufferedInputStream(zip.getInputStream(entry));
              int currentByte;
              // establish buffer for writing file
              byte[] data = new byte[buffer];

              // write the current file to disk
              FileOutputStream fos = new FileOutputStream(destFile);
   
              try (BufferedOutputStream dest = new BufferedOutputStream(fos, buffer)) {

                // read and write until last byte is encountered
                while ((currentByte = is.read(data, 0, buffer)) != -1) {
                  dest.write(data, 0, currentByte);
                }
                
                
                dest.flush();
                is.close();
              }
            }

            if (currentEntry.endsWith(".zip")) {
              // found a zip file, try to open
              extractFolder(destFile.getAbsolutePath());
            }
          }
        }
    }
}
Mr.mmmmmm
  • 1
  • 1
  • [1] Provide the content of 你好.txt that reproduces the problem. [2] You need to provide more details than _"some error when reading 你好.txt"_. What is the error you get? If there is an exception then post the stack trace. [3] Update your question to state the version of Java being used, and your O/S platform. [4] How were the zips created? Using Java, or some other tool? [4] If you rename `你好.txt` to `test4.txt` in `zipFile2.zip`, does everything work? That is, does the problem arise due to the _name_ of that file, or the _content_ of that file? Don't post comments - just update the question. – skomisa Jul 12 '21 at 03:57
  • Thank you for you reply. @skomisa [1] The content of 你好.txt are composed with two lines. First line is 何姑娘你好. Second line is 我是陳醫生. [2] The error message as the following: "Exception in thread "main" java.util.zip.ZipException: invalid CEN header (bad entry name)" [3] Version of java: ```openjdk version "1.8.0_191-1-ojdkbuild``` [4] No. I guess the problem is about the Chinese characters of my text file name and also the content of that file. Also, for ```test1``` to ```test3``` txt file. The content are English. Thank you very much! – Mr.mmmmmm Jul 12 '21 at 04:13
  • [1] OK, but please note the final sentence of my comment: **Don't post comments - just update the question**! It should not be necessary for readers to go through the comments to understand your problem. [2] In addition, update the question to show your operating system, the stack trace content, the failing line in your code, and details on how the zip files were created, including the encoding. – skomisa Jul 12 '21 at 04:24
  • Please update your question with the information requested in my previous comment. In particular, detail precisely how the zip files were created, and the encoding(s) that were used, so that others can attempt to reproduce/resolve your problem. Without that information your question is unanswerable. – skomisa Jul 12 '21 at 08:31
  • [1] Your constructor for `ZipFile()` means the _"UTF-8 charset is used to decode the entry names"_, so your problem is probably that the zip file was not created using UTF-8 encoding. [2] See [Java, unzip folder with German characters in filenames](https://stackoverflow.com/q/55393956/2985643) for a fix for this when using 7-Zip to create a zip file, but regardless of the creation method, the encoding used to create a zip file and the encoding used to read that zip file should match. – skomisa Jul 12 '21 at 16:25
  • Sorry, I forgot to reply you. I have solved this problem add a charset with encoding "MS950". I will end this question and modify my updated code. Thank you your reply and effort. – Mr.mmmmmm Jul 13 '21 at 01:08
  • OK. Please consider posting an answer to your own question, and accepting that answer. That is more helpful to the community than having the resolution buried in a comment. – skomisa Jul 13 '21 at 06:22

1 Answers1

0

This is a filename encoding issue and nothing to do with the contents of 你好.txt.

Looking at the source code for java.util.zip here, the error message invalid CEN header (bad entry name) is output

  1. if the language encoding bit is set in the zip file and the filename is not valid UTF-8
  2. when UTF-8 is not being used and the encoding for the filename does not match the encoding name stored in the encoding environment variable.

See Setting the default Java character encoding for details on using the Java encoding environment variable.

To know for sure what the issue is, can you to share the zip file?

If not, can you post a readout of the internal structure of the file by running zipdetails against the zip file. Usage is

zipdetails -v  zipFile2.zip

This program is present in most recent Linux distributions. As you are running Windows can get access to this script with the WSL if you don't have access to a Linux distrubution.

pmqs
  • 3,066
  • 2
  • 13
  • 22
  • ...which means the ZIP has been created disregarding th PKZip standard. But it could have the extra info "InfoZip Unicode Path" (0x7075) which is not supported by Java, tho. – AmigoJack Jul 12 '21 at 15:11
  • Thank you for your reply. I have solved this problem by changing the encoding from default "utf-8" to "MS950". The latest version you can check my code in above. Thank you so much! – Mr.mmmmmm Jul 13 '21 at 01:11