1

Often a URL which has a .jpg extension can turn out to be a .gif or .mp4 type of file and vice versa. Is there a way to figure out exactly what type of file a URL contains without downloading the entire file?

Example: http://i.imgur.com/9b4bIW9.jpg

This has .jpg extension, but is actually a .gif.

Adinia
  • 3,722
  • 5
  • 40
  • 58

1 Answers1

1

NOTE: My solution requires:

compile 'com.google.guava:guava:19.0'

as it provides the ByteStreams.toByteArray function to get the bytes array from an input stream. Of course you can use some other method to read the input stream.

NOTE: the StrictMode.ThreadPolicy stuff is required, else you will get exceptions.

Basically, we create a HTTP connection but only request first single byte of the remote url file. So we don't need to download the entire file. Then pass the bytes array through the bytestohex function to get it as raw bytes. Finally compare the first byte's signature with your requirements which I got from this url:

For other file types and signatures of file bytes, you can refer to: http://www.garykessler.net/library/file_sigs.html

Code:

protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState); 
    StrictMode.ThreadPolicy policy = new StrictMode.ThreadPolicy.Builder().permitAll().build();
    StrictMode.setThreadPolicy(policy);
    try {
        detectTypeOfFile();
    } catch (IOException e) {
        System.out.println("URL: CRASH: " + e.getStackTrace());
        e.printStackTrace();
    }
}

final protected static char[] hexArray = "0123456789ABCDEF".toCharArray();
public static String bytesToHex(byte[] bytes) {
    //http://stackoverflow.com/questions/9655181/how-to-convert-a-byte-array-to-a-hex-string-in-java
    char[] hexChars = new char[bytes.length * 2];
    for ( int j = 0; j < bytes.length; j++ ) {
        int v = bytes[j] & 0xFF;
        hexChars[j * 2] = hexArray[v >>> 4];
        hexChars[j * 2 + 1] = hexArray[v & 0x0F];
    }
    return new String(hexChars);
}

public void detectTypeOfFile() throws IOException {

    String[] urls = {"http://i.imgur.com/9b4bIW9.jpg","http://i.imgur.com/f00y2uz.jpg","http://i.imgur.com/9b4bIW9.mp4","http://i.imgur.com/9b4bIW9.gif"};

    for (int i=0;i<urls.length;i++){
        URL url = new URL(urls[i]);
        HttpURLConnection connection = ((HttpURLConnection) url.openConnection());
        connection.setRequestProperty("Range", "bytes="+0+"-"+0);
        connection.connect();
        byte[] bytes = ByteStreams.toByteArray(connection.getInputStream());
        System.out.println("URL: " + url.toString() + "  is of type: " + bytesToHex(bytes));
        switch (bytesToHex(bytes)) {
            //http://www.garykessler.net/library/file_sigs.html
            case "00":
                System.out.println("URL: " + url.toString() + "  is of type: mp4");
                break;
            case "FF":
                System.out.println("URL: " + url.toString() + "  is of type: image/jpeg");
                break;
            case "89":
                System.out.println("URL: " + url.toString() + "  is of type: image/png");
                break;
            case "47":
                System.out.println("URL: " + url.toString() + "  is of type: image/gif");
                break;
            case "49":
            case "4D":
                System.out.println("URL: " + url.toString() + "  is of type: image/tiff");
                break;
        }
        connection.disconnect();
    }
}

Output from above:

06-05 01:51:47.022 12554-12554/? I/System.out: URL: http://i.imgur.com/9b4bIW9.jpg  has first byte: 47
06-05 01:51:47.022 12554-12554/? I/System.out: URL: http://i.imgur.com/9b4bIW9.jpg  is of type: image/gif
06-05 01:51:47.056 12554-12554/? I/System.out: URL: http://i.imgur.com/f00y2uz.jpg  has first byte: FF
06-05 01:51:47.056 12554-12554/? I/System.out: URL: http://i.imgur.com/f00y2uz.jpg  is of type: image/jpeg
06-05 01:51:47.091 12554-12554/? I/System.out: URL: http://i.imgur.com/9b4bIW9.mp4  has first byte: 00
06-05 01:51:47.091 12554-12554/? I/System.out: URL: http://i.imgur.com/9b4bIW9.mp4  is of type: mp4
06-05 01:51:47.124 12554-12554/? I/System.out: URL: http://i.imgur.com/9b4bIW9.gif  has first byte: 47
06-05 01:51:47.124 12554-12554/? I/System.out: URL: http://i.imgur.com/9b4bIW9.gif  is of type: image/gif