3

This is a follow-up to my popular and technically challenging HTML injection into someone else's website? question.

To recap: I'd like to demo my technology to customers without actually modifying their live website (e.g. demoing the idea of Stackoverflow financial bounties, without modifying the live site). Essentially, I'm trying to create a server-side version of Greasemonkey.

I've implemented the mirror as follows:

  1. A request comes into http://myserver.com/forward?uri=[remote]
  2. My server opens a connect to [remote], pulls down the data and returns its body/headers to the request from #1.

I chose this syntax because I needed to handle requests from multiple domains (meaning, if stackoverflow.com links to meta.stackoverflow.com I need to handle both domains from the same forwarding server).

I have managed to rewrite links in the HTML and CSS files so they are relative to my server. The final hurdle is rewriting URLs referenced by Javascript files.

What is the best way to programmatically rewrite URLs referenced by someone else's Javascript code? Is this even technically doable?


Discussion

I'll give you an example of the technical hurdle I am facing. Take http://www.honda.com/ for example. They embed a Flash element on the page, but instead of embedding <object> directly, they use Javascript to document.write() the <object> tag containing the URL.

First attempt

  1. Use https://stackoverflow.com/a/14570614/14731 to listen for DOM change events. Instead of trying to rewrite URLs in the Javascript code, wait for it to modify the DOM and rewrite the URLs on the fly.
  2. Intercept all XmlHttpRequest requests using https://stackoverflow.com/a/629782/14731

Ideally we want intercept DOM changes before they render, so the browser does not request URLs before we have a chance to rewrite them.

Related resources:

Second attempt

A server-side solution will not work. Even if I can rewrite all DOM URLs, I've seen an example where an embedded Flash application references URLs stored in Javascript variables. There is no programmatic way to detect that these variables represent URLs because the Flash application is opaque.

Next, I plan on trying a client-side solution. I will load the original website in one frame, and manipulate its contents using Javascript in a second (hidden) frame. I hope to be able to inject new DOM elements (to demo my product) without having to rewrite the existing elements.

Community
  • 1
  • 1
Gili
  • 86,244
  • 97
  • 390
  • 689
  • 1
    I hope you understand that you cannot fix all possible URLs that might be constructed via javascript since some of them may be strings that are put together from other components that you may not be able to recognize by scanning the code. – jfriend00 Mar 09 '14 at 08:50
  • @jfriend00 You're right, which is why I'm thinking that intercepting DOM changes might be a more reliable approach. – Gili Mar 09 '14 at 22:08
  • Let's start from the top, why do you not want to modify the live page to begin with? Also, what is the desired end-product, what should your tech do and how should customers see it? Do you want to display an image of the page, a live browseable view, something else? – Etheryte Mar 11 '14 at 00:59
  • @Nit, I plan on sending the customer a link to a live browseable view of the modified site. If the customer likes what they see, they will modify the live site. The only thing our tech will need to do is configure where in the view to inject the HTML button (e.g. injecting a Paypal button into the Stackoverflow site). As an aside, are you still sure that the `demo` and `html-injections` tags should have be removed? Both sound quite relevant given the requirements I just explained above. – Gili Mar 11 '14 at 13:31

2 Answers2

1

Very challenging and interesting task. I would go with first saving the javascript files on my server and reference them from the HTML served. Then I would find the URLs in the files (using a regex or something) and replace it with the wanted value. I know it is not very fast, it is not very dynamic and all, but I believe it would be easier to implement.

Hope I helped!

Pantelis Natsiavas
  • 5,293
  • 5
  • 21
  • 36
  • Don't take this the wrong way but the entire point of this question is to figure out how to "find the URLs in the files". Meaning, this answer doesn't really answer my question :) – Gili Mar 09 '14 at 22:02
  • Understood. Do you want me to remove it? Do you think it could confuse someone reading it? – Pantelis Natsiavas Mar 10 '14 at 06:12
  • Yes please. Questions without any answers tend to get more attention from the search engine. – Gili Mar 10 '14 at 18:28
-1

Answering my own question.

After much research, I find this technique works best: https://stackoverflow.com/a/23231268/14731

In other words, there doesn't seem to be a general algorithm to rewrite links. Patching them by hand isn't as much work as you'd expect.

Community
  • 1
  • 1
Gili
  • 86,244
  • 97
  • 390
  • 689