20

I need to chech for the content type (if it's image, audio or video) of an url which has been inserted by the user. I have a code like this:

URL url = new URL(urlname);
URLConnection connection = url.openConnection();
connection.connect();
String contentType = connection.getContentType();

I'm getting the content type, but the problem is that it seems that it is necessary to download the whole file to check it's content type. So it last too much time when the file is quite big. I need to use it in a Google App Engine aplication so the requests are limited to 30 seconds.

Is there any other way to get the content type of a url without downloading the file (so it could be done quicker)?

skaffman
  • 398,947
  • 96
  • 818
  • 769
Javi
  • 19,387
  • 30
  • 102
  • 135
  • Just an idea: How about grapping the first n bytes and then closing the connection? It should be possible to guess the content type in most cases just by the beginning of the file. But I am no pro here. – pintxo Apr 27 '11 at 09:55
  • @pintxo why would you do that if you can read header param: `Content-Type` and instead of getting whole request with `GET` you just execute `HEAD` instead – To Kra May 07 '15 at 12:40

4 Answers4

34

Thanks to DaveHowes answer and googling around about how to get HEAD I got it in this way:

URL url = new URL(urlname);
HttpURLConnection connection = (HttpURLConnection)  url.openConnection();
connection.setRequestMethod("HEAD");
connection.connect();
String contentType = connection.getContentType();
Javi
  • 19,387
  • 30
  • 102
  • 135
22

If the "other" end supports it, could you use the HEAD HTTP method?

DaveH
  • 7,187
  • 5
  • 32
  • 53
  • Be aware of redirects, I faced same problem with my remote content check. See my code below where I did check. – To Kra May 07 '15 at 12:39
15

Be aware of redirects, I faced same problem with my remote content check.
Here is my fix:

/**
 * Http HEAD Method to get URL content type
 *
 * @param urlString
 * @return content type
 * @throws IOException
 */
public static String getContentType(String urlString) throws IOException{
    URL url = new URL(urlString);
    HttpURLConnection connection = (HttpURLConnection) url.openConnection();
    connection.setRequestMethod("HEAD");
    if (isRedirect(connection.getResponseCode())) {
        String newUrl = connection.getHeaderField("Location"); // get redirect url from "location" header field
        logger.warn("Original request URL: '{}' redirected to: '{}'", urlString, newUrl);
        return getContentType(newUrl);
    }
    String contentType = connection.getContentType();
    return contentType;
}

/**
 * Check status code for redirects
 * 
 * @param statusCode
 * @return true if matched redirect group
 */
protected static boolean isRedirect(int statusCode) {
    if (statusCode != HttpURLConnection.HTTP_OK) {
        if (statusCode == HttpURLConnection.HTTP_MOVED_TEMP
            || statusCode == HttpURLConnection.HTTP_MOVED_PERM
                || statusCode == HttpURLConnection.HTTP_SEE_OTHER) {
            return true;
        }
    }
    return false;
}

You could also put some counter for maxRedirectCount to avoid infinite redirects loop - but this is not covered here. This is just a inspiration.

To Kra
  • 3,344
  • 3
  • 38
  • 45
  • 2
    nice. why do you need to ask: if (statusCode != HttpURLConnection.HTTP_OK) { – Dejell Apr 17 '16 at 18:37
  • @Dejell its for handle redirects – To Kra Apr 19 '16 at 08:03
  • 1
    You can use `java.net.HttpURLConnection.setFollowRedirects(boolean)` in order to reduce the size of your boilerplate code. – Bass Oct 29 '19 at 14:07
  • `setFollowRedirects` seems to be `true` by default https://docs.oracle.com/javase/7/docs/api/java/net/HttpURLConnection.html#setFollowRedirects(boolean) – wz366 Dec 01 '21 at 04:44
0

I faced a similar task where I needed to check the content type of the url, and the way how I managed it is with retrofit. First you have to define an endpoint to call it with the url you want to check:

@GET
suspend fun getContentType(@Url url: String): Response<Unit>

Then you call it like this to get the content type header:

api.getContentType(url).headers()["content-type"]
Astrit Veliu
  • 1,252
  • 1
  • 12
  • 22