Java Zip File Extraction

Question

I have a GTFS schedule manager which automatically downloads a zipped file from a specified provider URL and extracts files from it on the fly to a specified folder. So at the end of this process the folder contains just the extracted files and not the zipped file itself.

This has up to now always worked worked but with

http://mta.maryland.gov/_googletransit/latest/google_transit.zip

it does not work for some reason. When I go to get the first zip entry from the stream it is null. I can however manually download the zipped file to a local folder, change the URL in my java application to it and it extracts fine. It is just the extraction on the fly that does not work.

This is demonstrated by running the code below as it is: you will see the failure. If you then download the zipped file manually to the "feeds" folder and swap around the commented "extractFilesFromFeed.extract" lines in main below the extraction works.

Question is if there a change I can make below so that this particular URL can be extracted on the fly ?

===

import java.io.File;
import java.io.FileOutputStream;
import java.net.URL;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

public class ExtractFilesFromFeed {

  private Logger logger = Logger.getLogger("");

  public void extract(String feedLocation, String feedFolder) throws Exception  {
    if (feedLocation == null || feedLocation.length() == 0) {
      String tmp = "Invalid feed location specified for GTFS schedule file extraction";
      throw new Exception(tmp);
    }
    else if (feedFolder == null || feedFolder.length() == 0) {
      String tmp = "Invalid feed folder specified for GTFS schedule file extraction";
      throw new Exception(tmp);
    }
    else {
      logger.log(Level.INFO, String.format("Extracting GTFS schedule files from %s to %s", 
          feedLocation, feedFolder));
    }

    URL url;
    if (feedLocation.startsWith("http")) {
      url = new URL(feedLocation);
    }
    else {
      url = new File(feedLocation).toURI().toURL();
    }

    File dir = new File(feedFolder);
    if(!dir.exists()){
      dir.mkdir();
    }

    byte[] buffer = new byte[8192];
    ZipInputStream zis = new ZipInputStream(url.openStream());
    ZipEntry ze = zis.getNextEntry();

    if (ze == null) {
      logger.log(Level.WARNING, "Unable to get first entry from zip file, aborting download");
      zis.close();
      throw new Exception(String.format("Unable to get first entry from zip file %s", feedLocation));
    }

    while (ze != null){
      String zipFileName = ze.getName();
      if (ze.isDirectory()) {
        dir = new File(feedFolder + "/" + zipFileName);
        if(!dir.exists()){
          dir.mkdir();
        }
      }
      else {
        FileOutputStream fos = new FileOutputStream(feedFolder + File.separator + zipFileName);
        int len;
        while ((len = zis.read(buffer)) > 0) {
          fos.write(buffer, 0, len);
        }
        fos.close();   
      }
      ze = zis.getNextEntry();
    }
    zis.close();
  }

  public static void main(String[] args) throws Exception {
    ExtractFilesFromFeed extractFilesFromFeed = new ExtractFilesFromFeed();
    extractFilesFromFeed.extract("http://mta.maryland.gov/_googletransit/latest/google_transit.zip", "feeds");
    //extractFilesFromFeed.extract("feeds/google_transit.zip", "feeds");
  }
}

debug your program one step at a time and try to look for an error. SO is a question-and-answer site, not a coding service nor a debugging help - your question is very broad "what is wrong with my code" - pin down the problem and edit your question — Japu_D_Cret, Mar 28 '17 at 08:37
In a nutshell the problem is that the java code above can extract the specified zip file yet it cannot download and extract the same file and I am wondering has anybody else seen this problem with java and zipped files. The code can download and extract other zip files without problem. — paulh, Mar 28 '17 at 10:32
You might want to check out http://stackoverflow.com/questions/15521966/zipinputstream-getnextentry-returns-null-on-some-zip-files. — Sean Barbeau, Mar 28 '17 at 14:49
Actually, looks like this is related to a HTTP 301 - see https://github.com/CUTR-at-USF/gtfs-realtime-validator/issues/89#issuecomment-289800989. — Sean Barbeau, Mar 28 '17 at 15:09
HTTP 301 redirect solution at http://stackoverflow.com/a/18431514/937715. — Sean Barbeau, Mar 28 '17 at 15:51

score 1 · Accepted Answer · edited May 23 '17 at 10:30

Looks like there are actually two problems here:

http://mta.maryland.gov/_googletransit/latest/google_transit.zip has a HTTP 301 redirect to a secure SSL version at https://mta.maryland.gov/_googletransit/latest/google_transit.zip.
The SSL handshake may fail due to lack of security policy files

For the redirect, you'll need to use something like the following:

URL url;
if (feedLocation.startsWith("http")) {
    url = new URL(feedLocation);
    URLConnection urlConnection = url.openConnection();
    // Check for HTTP 301 redirect
    String redirect = urlConnection.getHeaderField("Location");
    if (redirect != null) {
        logger.log(Level.WARNING, "Redirecting to " + redirect);
        url = new URL(redirect);
    }
} else {...

Then, when opening the input stream, you'll probably want to catch and log any SSLHandshakeExceptions:

try {
    ZipInputStream zis = new ZipInputStream(url.openStream());
    ...
} catch (SSLHandshakeException sslEx) {
    logger.log(Level.ERROR, "SSL handshake failed.  Try installing the JCE Extension - see http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html");
}

To install the JCE Extension, you will need to replace the US_export_policy.jar and local_policy.jar files in your JVM /security directory, such as C:\Program Files\Java\jdk1.8.0_73\jre\lib\security, with the JAR files in the JCE Extension download.

I just fixed this same issue in our project - the commit that resolved the issue is https://github.com/CUTR-at-USF/gtfs-realtime-validator/commit/180785d22ca58afa2463b322ad4e1b122c5f0a30, and the Github issue for the problem was https://github.com/CUTR-at-USF/gtfs-realtime-validator/issues/89.

Credit to 301 Moved Permanently for the HTTP 301 redirect solution and https://stackoverflow.com/a/30760134/937715 for installing the JCE Extension.

This is the resolution in this case. The redirect was been treated as the zip file and the first 4 bytes were therefore not the correct file signature which should be 0x04034b50. There is a check for this in the java 8 ZipInputStream class. — paulh, Mar 29 '17 at 10:47
@paulh Glad this worked! Could you please go ahead and accept it as the answer so others know too? — Sean Barbeau, Mar 29 '17 at 13:29
I tried but even though it notes my action it won't display it publicly as I am new to the system and have a reputation score less than 15. — paulh, Mar 30 '17 at 15:48
Ok got it, I was actually trying to up vote to accept the answer and this is restricted. Just noticed the tick which changed color to green once clicked. — paulh, Mar 31 '17 at 08:32

Java Zip File Extraction

1 Answers1