2

I'm trying to programmatically find the final destination of a bing link:

https://www.bing.com/ck/a?!&&p=e8e1e7228136c509JmltdHM9MTY1OTM5MTI0MiZpZ3VpZD1jY2RlYTU1Yy1kYzRkLTRjNjctOTIwMC1hZTUwYTk4M2QyNzImaW5zaWQ9NTcwOQ&ptn=3&hsh=3&fclid=62b91a1d-11e5-11ed-88df-bbbd25b14f27&u=a1aHR0cHM6Ly93d3cuZGFuaWVsc2h2YWMuY29tLw&ntb=1

On a browser, this redirects to https://www.danielshvac.com/

However, if I try to find that website by assuming the first redirects to the second, I just see that there is no redirection.

What's going on, how can I find the final destination of these bing.com/ck/a links?

Code:

  1. Based on this SO answer
r = requests.get('https://www.bing.com/ck/a?!&&p=e8e1e7228136c509JmltdHM9MTY1OTM5MTI0MiZpZ3VpZD1jY2RlYTU1Yy1kYzRkLTRjNjctOTIwMC1hZTUwYTk4M2QyNzImaW5zaWQ9NTcwOQ&ptn=3&hsh=3&fclid=62b91a1d-11e5-11ed-88df-bbbd25b14f27&u=a1aHR0cHM6Ly93d3cuZGFuaWVsc2h2YWMuY29tLw&ntb=1') 
print(r.url) # https://www.bing.com/ck/a?!&&p=e8e1e7228136c509JmltdHM9MTY1OTM5MTI0MiZpZ3VpZD1jY2RlYTU1Yy1kYzRkLTRjNjctOTIwMC1hZTUwYTk4M2QyNzImaW5zaWQ9NTcwOQ&ptn=3&hsh=3&fclid=62b91a1d-11e5-11ed-88df-bbbd25b14f27&u=a1aHR0cHM6Ly93d3cuZGFuaWVsc2h2YWMuY29tLw&ntb=1
  1. Based on this SO answer
response = requests.get(https://www.bing.com/ck/a?!&&p=e8e1e7228136c509JmltdHM9MTY1OTM5MTI0MiZpZ3VpZD1jY2RlYTU1Yy1kYzRkLTRjNjctOTIwMC1hZTUwYTk4M2QyNzImaW5zaWQ9NTcwOQ&ptn=3&hsh=3&fclid=62b91a1d-11e5-11ed-88df-bbbd25b14f27&u=a1aHR0cHM6Ly93d3cuZGFuaWVsc2h2YWMuY29tLw&ntb=1)
if response.history:
    print("Request was redirected")
    for resp in response.history:
        print(resp.status_code, resp.url)
    print("Final destination:")
    print(response.status_code, response.url)
else:
    print("Request was not redirected") # this is printed

Update: by reading the content of the link via curl, I can see that you get an HTML doc with some functions that redirect you, which I guess is why there's no real redirection.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="referrer" content="origin-when-cross-origin">
    <script>//<![CDATA[
      var s = false;
      function l() {
        setTimeout(f, 10000);
        if (document.referrer) {
          try {
            var pm = /(^|&|\?)px=([^&]*)(&|$)/i;
            var px = window.location.href.match(pm);
            var rs = document.referrer;
            if (px != null) {
              if (rs.match(pm))
                rs = rs.replace(pm, "$1px=" + px[2] + "$3");
              else if (rs.indexOf("?") != -1)
                rs = rs + "&px=" + px[2];
              else
                rs = rs + "?px=" + px[2];
            }
            history.replaceState({}, "Bing", rs);
            window.addEventListener("pageshow", function(e) { if (e.persisted || (typeof window.performance != "undefined" && window.performance.navigation.type === 2)) window.location.reload(); });
            s = true;
            setTimeout(r, 10);
            return;
          } catch (e) {}
        }
        r();
      }
      function r() {
        var u = "https://www.danielshvac.com/";
        if (s)
          window.location.href = u;
        else
          window.location.replace(u);
      }
      function f() {
        document.getElementById("fb").style.display = "block";
      }
      //]]>
    </script>
  </head>
  <body onload="l()">
    <div id="fb" style="display: none">
      Please <a href="https://www.bing.com/ck/a?!&&p=e8e1e7228136c509JmltdHM9MTY1OTM5MTI0MiZpZ3VpZD1jY2RlYTU1Yy1kYzRkLTRjNjctOTIwMC1hZTUwYTk4M2QyNzImaW5zaWQ9NTcwOQ&ptn=3&hsh=3&fclid=62b91a1d-11e5-11ed-88df-bbbd25b14f27&u=a1aHR0cHM6Ly93d3cuZGFuaWVsc2h2YWMuY29tLw&ntb=F">click here</a> if the page does not redirect automatically ...
    </div>
  </body>
</html>

Now trying to figure out how to execute this and get the link

Joey Baruch
  • 4,180
  • 6
  • 34
  • 48

3 Answers3

3

The u parameter contains the destination. base64 decode aHR0cHM6Ly93d3cuZGFuaWVsc2h2YWMuY29tLw==

Kjortel
  • 46
  • 2
  • Hi @Kjortel, I wanted to ask how did you know to remove the `a1` prefix, and if it's always a1, always remove first 2 chars, or how I should think about this? – Joey Baruch Jun 15 '23 at 19:17
2

Since the curl output shows, that the script inside the HTML-document already contains the destination-url, you can simply extract it with a one-line Python code

r.content.decode().split("var u = ")[1].split("\";")

This will split the content of the request (what you got from curl) at the initialization of the URL-variable and then again at the end of this variable, so you'll get the destination-URL only.

DevEmperor
  • 21
  • 4
  • Problem with that approach is that it makes assumptions of how Bing generates these files. If they change it in the future, then our code breaks. – Joey Baruch Aug 05 '22 at 18:11
  • @JoeyBaruch That is true, however I don't see any other way to solve this problem... And even if they'll change the code, it would be easy to update the parsing :) – DevEmperor Aug 06 '22 at 19:46
0

I recognized the string "aHR0cHM6" which is "https:" base64-encoded. So I guessed that removing "a1" could give a URL if I base64-decoded the rest of the string (with "==" appended for base64 completeness).

This is a comment to your follw up question Joey Baruch. It may not align properly due to the fact that I have no login.