2

The problem is although I know how to download a File from URL , for example :

http://i12.photobucket.com/albums/a206/zxc6/1_zps3e6rjofn.jpg


When it comes to files like the below:

https://images.duckduckgo.com/iu/?u=http%3......

I have no clue how to download it.


The code I am using to download Files with IOUtils it works great if the extension is visible but in the case of the above example returns :

java.io.IOException: Server returned HTTP response code: 500 for URL: https://images.duckduckgo.com/iu/?u=http%3A%2F%2Fimages2.fanpop.com%2Fimage%2Fphotos%2F8900000%2FFirefox-firefox-8967915-1600-1200.jpg&f=1

Even if you remove the &f=1.


Code for Downloader (It is for testing purposes.... a prototype):

import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URL;
import java.net.URLConnection;

import org.apache.commons.io.IOUtils;

public class Downloader {

    private static class ProgressListener implements ActionListener {

    @Override
    public void actionPerformed(ActionEvent e) {
        // e.getSource() gives you the object of
        // DownloadCountingOutputStream
        // because you set it in the overriden method, afterWrite().
        System.out.println("Downloaded bytes : " + ((DownloadProgressListener) e.getSource()).getByteCount());
    }
    }

    /**
     * Main Method
     * 
     * @param args
     */
    public static void main(String[] args) {
    URL dl = null;
    File fl = null;
    String x = null;
    OutputStream os = null;
    InputStream is = null;
    ProgressListener progressListener = new ProgressListener();
    try {
        fl = new File(System.getProperty("user.home").replace("\\", "/") + "/Desktop/image.jpg");
        dl = new URL(
            "https://images.duckduckgo.com/iu/?u=http%3A%2F%2Fimages2.fanpop.com%2Fimage%2Fphotos%2F8900000%2FFirefox-firefox-8967915-1600-1200.jpg&f=1");
        os = new FileOutputStream(fl);
        is = dl.openStream();

        // http://i12.photobucket.com/albums/a206/zxc6/1_zps3e6rjofn.jpg

        DownloadProgressListener dcount = new DownloadProgressListener(os);
        dcount.setListener(progressListener);

        URLConnection connection = dl.openConnection();

        // this line give you the total length of source stream as a String.
        // you may want to convert to integer and store this value to
        // calculate percentage of the progression.
        System.out.println("Content Length:" + connection.getHeaderField("Content-Length"));
        System.out.println("Content Length with different way:" + connection.getContentType());

        System.out.println("\n");

        // begin transfer by writing to dcount, not os.
        IOUtils.copy(is, dcount);

    } catch (Exception e) {
        System.out.println(e);
    } finally {
        IOUtils.closeQuietly(os);
        IOUtils.closeQuietly(is);
    }
    }
}

Code for DownloadProgressListener:

import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import java.io.IOException;
import java.io.OutputStream;

import org.apache.commons.io.output.CountingOutputStream;

public class DownloadProgressListener extends CountingOutputStream {

    private ActionListener listener = null;

    public DownloadProgressListener(OutputStream out) {
    super(out);
    }

    public void setListener(ActionListener listener) {
    this.listener = listener;
    }

    @Override
    protected void afterWrite(int n) throws IOException {
    super.afterWrite(n);
    if (listener != null) {
        listener.actionPerformed(new ActionEvent(this, 0, null));
    }
    }

}

Question I have read before posting:

1)Download file from url that doesn't end with .extension

2)http://www.mkyong.com/java/how-to-get-url-content-in-java/

3)Download file using java apache commons?

4)How to download and save a file from Internet using Java?

5)How to create file object from URL object

Community
  • 1
  • 1
GOXR3PLUS
  • 6,877
  • 9
  • 44
  • 93
  • This has nothing to do with the extension. – shmosel Jan 28 '17 at 00:25
  • @shmosel You can correct the title if i am wrong . That's how i though it . It has to do with redirections ? – GOXR3PLUS Jan 28 '17 at 00:26
  • As shmosel pointed out, extension doesn't matter. The issue is attempting to download something that's probably a redirect or some other query. I'm not sure of any easy solution, but if you look at: `https://images.duckduckgo.com/iu/?u=http%3A%2F%2Fimages2.fanpop.com%2Fimage%2Fphotos%2F8900000%2FFirefox-firefox-8967915-1600-1200.jpg&f=1`, there is in fact a URL to the image there that you could parse. – Christopher Schneider Jan 28 '17 at 00:45

2 Answers2

3

As pointed out in the comments, extension is irrelevant.

The issue here is attempting to download something that's probably a re-direct or maybe just an async call's parameters.

Your Extra big url without extension is broken, but I can answer a potential solution for the other type.

If you observe the URL:

https://images.duckduckgo.com/iu/?u=http%3A%2F%2Fimages2.fan‌​pop.com%2Fimage%2Fph‌​otos%2F8900000%2FFir‌​efox-firefox-8967915‌​-1600-1200.jpg&f=1

the URL to the image is actually there. It's just encoded and should be pretty easy to decode. There are decoding libraries included in Java (java.net.URLDecoder), but should you wish to do it yourself, you can look at it this way:

http%3A%2F%2Fimages2.fan‌​pop.com%2Fimage%2Fph‌​otos%2F8900000%2FFir‌​efox-firefox-8967915‌​-1600-1200.jpg&f=1

The encoded portions are %XX where XX is any two characters. Looking at an HTML encoding table, you'll see %3A is, obviously, a colon. %2F is a forward slash.

If you replace all the encoded entities, you'll end up with: http://images2.fan‌​pop.com/image/ph‌​otos/8900000/Fir‌​efox-firefox-8967915‌​-1600-1200.jpg&f=1

In this case, you don't want the extra parameters, so you can discard the &f=1 and download the image from the original URL. In most cases, I imagine you can keep the extra parameter and it'll just be ignored.

--

In a nutshell:

  1. Extract the original URL
  2. Decode it
  3. Download

I'd like to point out this is a fragile solution and will break if the URL pattern changes, or it would require a lot of maintenance. If you're targeting more than a small group of users, you should re-think your approach.

HTML URL encoding table

Christopher Schneider
  • 3,745
  • 2
  • 24
  • 38
  • Thank you Christopher , now i understand it after reading the links you provided and successfully downloaded the image . I am following the steps you provided at the end 1,2,3 . The hard part is 1 how to extract the original URL....? For example here you got `http://images2.fan‌​pop.com/image/ph‌​otos/8900000/Fir‌​efox-firefox-8967915‌​-1600-1200.jpg&f=1` after cutting a part of the decoded url...(`https://images.duckduckgo.com/iu/?u=`)] 2 and 3 have been done easy . – GOXR3PLUS Jan 28 '17 at 01:41
  • I'm not sure I can answer that for you. Are you wanting to download from multiple places? Is it just duckduckgo? You'll need to figure out how to extract the URL depending on the source. Looking at the URL, if they always prefix every image URL with `https://images.duckduckgo.com/iu/?u=` you can just snip that off the beginning of the string and you're done. You could also look for `http%3A%2F%2F` and use that as the beginning of the URL. Regex would be fine here. Look at `java.util.regex.Matcher`. Specifically `find()` and `start()` – Christopher Schneider Jan 28 '17 at 02:02
  • Re-reading your comment, I don't think I addressed it. As I said in my original comment on your question, there's no simple solution, and I'm not sure how you're actually implementing this. If all you've got is URLs, you'll need to create some sort of algorithm that is able to parse URLs and try to find URLs embedded in those URLs. – Christopher Schneider Jan 28 '17 at 02:08
  • Yes detecting `http%3A%2F%2F` is one possible solution . But it will work for other web searchers like google..? I want the user to be able to download generally files using my application . . Also using the code found ( http://www.mkyong.com/java/java-httpurlconnection-follow-redirect-example/ ) i see that the `html.toString()` returns the code of a `.jpg` image so i tried using FileWriter and exported it into a `File` saved it as `.jpg` . When tried to opened it with Painter , it doesn't work.... Strange.. – GOXR3PLUS Jan 28 '17 at 03:06
3

If you want a "quick and dirty" way to solve the problem, look at @Christopher Schneider's answer. (But it might break if DuckDuckGo's URL syntax changes ... )

I did a bit of digging around (using curl --trace-ascii, etc). This is not an issue with redirection. According to curl, the 500 is the immediate response to the request.

So my best guess is that this behavior is "by design". The server is looking at the request headers (e.g. the "User-Agent" header) and deciding that your request doesn't look like it comes from a supported browser. The 500 response is deliberate or accidental obfuscation.

Why?

Most likely, the folks who run DuckDuckGo don't want you using that server endpoint for automated downloads, scraping, whatever. They are not entirely clear about it, but this link goes some way to explaining:

Solution?

Don't do it! See if you can do what you are trying to do using their official APIs (see above). If that doesn't work, contact them.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • I want the user to be able to download generally files using my application . So the problem seems complicated , but thanks to Christoper answer i managed to find a workaround :) . Also using the code found ( http://www.mkyong.com/java/java-httpurlconnection-follow-redirect-example/ ) i see that the `html.toString()` returns the code of a `.jpg` image so i tried using FileWriter and exported it into a `File` saved it as `.jpg` . When tried to opened it with Painter , it doesn't work.... Strange.. – GOXR3PLUS Jan 28 '17 at 03:04
  • +1. I agree with this answer. My answer was definitely "quick and dirty," but I was answering based on the info provided, which was a URL string. I've written many scripts to do things like this, but I only write them for me and maybe a few other devs with an asterisk that they're fragile and will break at some point. If @GOXR3PLUS is looking to write legit, consumer facing software, they should scrap their idea and follow this answer. – Christopher Schneider Jan 28 '17 at 03:42