0

How can I get the "final location" (a.k.a. landing-page) of the following URL:

http://pixel.mathtag.com/click/img?mt_aid=3432042357544051869&mt_id=540771&mt_adid=100306&mt_sid=293670&mt_uuid=52bf1f56-6fe2-5261-010a-0bbc2fa71e3e&mt_3pck=http%3A//track.pubmatic.com/AdServer/AdDisplayTrackerServlet%3FclickData%3DJnB1YklkPTIwOTc3JnNpdGVJZD0zMDE1MSZhZElkPTI2NjA0JmthZHNpemVpZD05JnRsZElkPTAmcGFzc2JhY2s9MCZjYW1wYWlnbklkPTEyMTYmY3JlYXRpdmVJZD0wJmFkU2VydmVySWQ9MjQz_url%3D&redirect=http://weeklyad.target.com

My code (below) takes this string as input.

The output should be something like 'http://weeklyad.target.com', but instead, I just get the same URL.

No need to mention, I am unable to solve this specific case, but I still need a general solution.

Here is my simple Java code, using HttpURLConnection (where String ref is the input):

        HttpURLConnection con = (HttpURLConnection)new URL(ref).openConnection();
        con.setInstanceFollowRedirects(true);
        con.setRequestProperty("User-Agent","");
        if (con.getResponseCode()/100 == 3)
        {
            String target = con.getHeaderField("Location");
            if (target != null)
                return target;
        }
        return con.getURL().toString();

Does anybody have any idea what am I doing wrong?

barak manos
  • 29,648
  • 10
  • 62
  • 114

1 Answers1

1

The server returns this:

<html>
<head>
<meta http-equiv="refresh" content="1; url=http://weeklyad.target.com">
<title>Redirect</title>
<script language="javascript" type="text/javascript">
<!--
function track_click(url)
{
    var req = new Image();
    req.src = url;
}

function redirect(url)
{
    window.location = url;
}

var url_raw = "http://weeklyad.target.com";
var url_enc = "http%3A%2F%2Fweeklyad.target.com";

track_click("http://track.pubmatic.com/AdServer/AdDisplayTrackerServlet?clickData=JnB1YklkPTIwOTc3JnNpdGVJZD0zMDE1MSZhZElkPTI2NjA0JmthZHNpemVpZD05JnRsZElkPTAmcGFzc2JhY2s9MCZjYW1wYWlnbklkPTEyMTYmY3JlYXRpdmVJZD0wJmFkU2VydmVySWQ9MjQz_url=" + url_enc);

var redirect_timeout = 300;
setTimeout('redirect("http://weeklyad.target.com")', redirect_timeout);
// -->
</script></head><body></body></html>

So the redirect happens because of redirect function (javascript) being called and not a Location (header) redirect.

BTW: you can see where you will be reaching by looking at the original URL, notice the &redirect=http://weeklyad.target.com parameter

Noam Rathaus
  • 5,405
  • 2
  • 28
  • 37
  • With regards to the 'BTW' remark: it may be the case in this specific example, but not in many others; As I mentioned, I need a general solution; And of course - I just need the URL of the landing-page and not the entire page contents; Thank you – barak manos Dec 28 '13 at 19:40
  • You won't be able to do a generic "solution" unless you emulate completely a browser, and not just preform an HTTP GET – Noam Rathaus Dec 28 '13 at 19:47
  • So are you saying that I should use Selenium or something similar? – barak manos Dec 28 '13 at 19:48
  • Yes, you will need full-browser emulation here, or handle Location and JS redirecting, but you still might miss META REFRESH, so you need to cover that too, So the best option is selenium – Noam Rathaus Dec 28 '13 at 19:51
  • According to http://stackoverflow.com/a/5665218/1382251, even Selenium may fail to accomplish this. – barak manos Dec 28 '13 at 19:56