15

A product I'm helping to develop will basically work like this:

  • A Web publisher creates a new page on their site that includes a <script> from our server.
  • When a visitor reaches that new page, that <script> gathers the text content of the page and sends it to our server via a POST request (cross-domain, using a <form> inside of an <iframe>).
  • Our server processes the text content and returns a response (via JSONP) that includes an HTML fragment listing links to related content around the Web. This response is cached and served to subsequent visitors until we receive another POST request with text content from the same URL, at which point we regenerate a "fresh" response. These POSTs only happen when our cached TTL expires, at which point the server signifies that and prompts the <script> on the page to gather and POST the text content again.

The problem is that this system seems inherently insecure. In theory, anyone could spoof the HTTP POST request (including the referer header, so we couldn't just check for that) that sends a page's content to our server. This could include any text content, which we would then use to generate the related content links for that page.

The primary difficulty in making this secure is that our JavaScript is publicly visible. We can't use any kind of private key or other cryptic identifier or pattern because that won't be secret.

Ideally, we need a method that somehow verifies that a POST request corresponding to a particular Web page is authentic. We can't just scrape the Web page and compare the content with what's been POSTed, since the purpose of having JavaScript submit the content is that it may be behind a login system.

Any ideas? I hope I've explained the problem well enough. Thanks in advance for any suggestions.

Bungle
  • 19,392
  • 24
  • 79
  • 106
  • Curious: can your server run any server-side logic? Eg aspx, php, rails, or *any* server-side language? – Crescent Fresh Apr 01 '10 at 16:59
  • @Crescent Fresh: It can; our app server primarily uses Rails. That would probably be feasible for the POST request (i.e. when a page sends our server its text content) but definitely not for the GET requests (i.e. when we're serving our cached HTML) which are served through Varnish - http://varnish-cache.org/. – Bungle Apr 01 '10 at 17:58
  • 2
    @Bungle: You mention "we can't use any ... cryptic identifier", but such a mechanism can be publicly observable while still remain secure. You can generate an encrypted key derived from the web publisher's domain (eg "foo.bar.com"). Your server-side code can decrypt the key to pluck the domain (among other things), and your client-side code obviously has access to `window.location` to verify the script is being included only by domains authorized to do so. Similar to what Google does with their client script apis: http://code.google.com/apis/maps/documentation/premier/guide.html#Signup – Crescent Fresh Apr 01 '10 at 18:18
  • 1
    @Bungle: nevermind, I think I see what you're asking. You want to secure the actual POST request, not the script inclusion. – Crescent Fresh Apr 01 '10 at 18:24
  • 1
    This is not an easy problem to solve. I need some more information about your application. What is a valid file upload? Are you forcing users to login? Are you only accepting posts from specific domains? You must be more specific in what should be valid vs invalid. – rook Apr 01 '10 at 19:24
  • We have had a similar problem for 3 years now, and still don't have a good solution. There just isn't a way to secure such a thing. We have left our services open, and so far haven't had 'unauthorized' people using them. You may or may not be as lucky. – Sripathi Krishnan Apr 01 '10 at 20:13
  • 1
    @Crescent Fresh: That's correct. Our thought is that anyone could generate a POST request that emulates ours, regardless of whether they're running our ` – Bungle Apr 01 '10 at 20:55
  • 1
    @The Rook: In essence, we want a way to ensure that a POST request received by our server was actually initiated by a ` – Bungle Apr 01 '10 at 21:02
  • ...simply make a bogus POST that would cause our server to use any content and URL that they supply. Please let me know if that's enough info to go on. – Bungle Apr 01 '10 at 21:02
  • @Sripathi: Thanks for your input. It does seem to be an impossible problem. I think there are measures we can take to minimize damage from any potential attacks, but I really wish there was a "slam-dunk" solution to essentially prevent them. – Bungle Apr 01 '10 at 21:10
  • 1
    @Bungle I changed my mind, this isn't a difficult problem to solve, its impossible. You need to authenticate the individual's browser in some way, you cannot authenticate a piece of client side code like this. Keep in mind that an attacker has a huge amount of control over http requests, it could be a forgery: (http://milw0rm.com/exploits/7383) – rook Apr 01 '10 at 21:37

10 Answers10

7

There is no smoking gun for this. However, where big guns don't exist major annoyance can. Hackers like a challenge, but they prefer an easy target. Be annoying enough that they give up.

Google and others do this effectively with ad words. Create an api token and have them send that. Have a "verification" process for sites using your script that requires the registrant for this script to allow their site to be profiled prior to the use of the script. You can then collect every bit of information about the server in question and if the server profile does not match the one on record, can the request.

Get everything you can know about the browser and client and create a profile for it. If there is any chance it's browser spoofing, drop the request. If the profile repeats but the cookie is gone ignore the input. If you get more than one request from the token in a short period (i.e. rapid page refreshes inherent with hack attempts) ignore the request.

Then go one step further and ping the actual domain to verify that it exists and is an authorized domain. Even if the page is behind a login the domain will still respond. This in itself won't stop hackers, but it is done server side and therefore hidden.

Also, you might consider profiling the content for a page. If a site dedicated to kitchen utensils starts sending back content for adult dating, raise a red flag.

Lastly, when a bad request comes in that you've profiled as a bad request, send the JSONP from what would be a good request for that page based on data you know is good (a 24 hour old version of the page etc.). Don't tell the hacker you know they are there. Act as if everything is fine. It will take them quite awhile to figure that one out!

None of these ideas fulfills the exact needs of your question, but hopefully it will inspire some insidious and creative thinking on your part.

Joe Mills
  • 1,619
  • 11
  • 12
4

How about this? - the <script/> tag that a third party sites includes has a dynamic src attribute. So, instead of loading some static Javascript resource, it comes to your server, generates a unique key as an identifier for the website and sends it back in the JS response. You save the same key in user-session or your database. The form created and submitted by this JS code will submit this key parameter too. Your backend will reject any POST request which does not have a matching key with the one in your db/session.

avlesh
  • 206
  • 1
  • 6
1

Give people keys on a per-domain basis.

Make people include in the requests the hash the value of the [key string + request parameters]. (The hash value should be computed on the server)

When they send you the request, you, knowing the parameters and the key, can verify the validity.

glebm
  • 20,282
  • 8
  • 51
  • 67
  • The key string is in JavaScript, though, so anyone can see it and use it to formulate a bogus POST request. – Bungle Apr 03 '10 at 22:22
1

The primary weakness with the system as you described it is that you are "given" the page content, why not go and get the page content for yourself?

  1. A Web publisher creates a new page on their site that includes a script from your server.
  2. When a visitor reaches that new page, that script sends a get request to your server.
  3. Your server goes and gets the content of the page (possibly by using the referrer header to determine the source of the request).
  4. Your server processes the text content and returns a response (via JSONP) that includes an HTML fragment listing links to related content around the Web. This response is cached and served to subsequent visitors from a server side cache / proxy
  5. When the TTL for the cached version expires, the proxy will forward the request on to your app and the whole cycle starts again from step 3.

This stops malicious content from being "fed" to your server and allows you to provide some form of API key that ties requests and domains or pages together ( i.e. api key 123 only works for referrers on mydomain.com - anything else is obviously spoofed ). Due to the caching / proxy your app is protected to some degree from any form of DOS type attack as well because the page content is only processed once every time the cache TTL expires ( and now you can handle increasing loads by extending the TTL until you can bring additional processing capability on). Now your client side script is insanely small and simple - no more scraping content and posting it - just send an ajax request and maybe populate a couple of parameters ( api key / page ).

Neal
  • 4,278
  • 2
  • 19
  • 25
  • The problem with step 3 in this answer is spelled out in the question; the page being scraped and sent elsewhere is potentially behind a login. So, the user has logged-in to a page that's being scraped/stolen, that the server collecting the content can't access by a simple GET without impersonating the user, logging in, navigating to it, etc. I am skeptical whether this is a goodly use case we're being asked to help out with, though. – Chris Moschini Oct 26 '11 at 07:25
1

First of all, I would validate the domain (and maybe the "server profile") as suggested by others here, and obviously very strictly validate the content of the POST (as I hope you're already doing anyway).

If you make the URL for your script file point to something that's dynamically generated by your server, you can also include a time-sensitive session key to be sent along with the POST. This won't completely foil anyone, but if you're able to make the session expire quickly enough it will be a lot more difficult to exploit (and if I understand your application correctly, sessions should only need to last long enough for the user to enter something after loading a page).

After typing this, I realize it's basically what avlesh already suggested with the addition of an expiry.

Community
  • 1
  • 1
Matt Kantor
  • 1,704
  • 1
  • 19
  • 37
0

If you can add server-side code to the site pushing data to your site, you could use a MAC to at least prevent non-logged in users from sending anything.

If just anyone is allowed to use the page, then I can't think of a waterproof way of confirming the data without scraping the webpage. You can make sending arbitrary content somewhat more difficult with referer checks and whatnot, but not 100% impossible.

Matti Virkkunen
  • 63,558
  • 9
  • 127
  • 159
  • Thanks very much for your input, Matti. Unfortunately I can't add any server-side code to the site that POSTs the text content. My sense, which seems to match yours, is that securing this is technically impossible, but I'm hoping to find some way that at least makes an attack extraordinarily difficult and not worth the trouble. – Bungle Apr 01 '10 at 16:58
  • In this cased a MAC is trivial to spoof. How are you supposed to keep the secret from an attacker? – rook Apr 01 '10 at 19:05
  • Using a shared secret or public/private key between the server and the service. Obviously this doesn't prevent logged in users from abusing or leaking it, only unauthenticated people. – Matti Virkkunen Apr 01 '10 at 19:39
0

You could have hashed keys specific to each clients IP address and compare that value on the server for each post using the IP in the post header. The up side to this is if someone spoofs their IP the response will still be sent to the spoofed IP and not the attacker's. You might already know this but i'd also suggest adding salt to your hashes.

With a spoofed IP a proper TCP handshake can't take place so the attackers spoofed post isn't completed.

There could be other security concerns i'm not aware of but i think it might be an option

Brian
  • 4,974
  • 2
  • 28
  • 30
  • A single client might appear to be coming from different IP addresses for different requests, depending on how their ISP is configured, so the hashed key specific to a client's (first) IP might look invalid later when it's coming from another IP. It's not a common situation, but common-enough that you might break some of your users in a way they can't really remedy or control. – Andrew A. Oct 26 '17 at 03:02
0

Can the web publisher also put a Proxy page on their server?

Then load the script through the proxy. Then you have a number of possibilities where you can control the connection between the two servers, add encryption and things like that.

What is the login system? What about using a SSO solution and keeping your scripts separate?

Ruz
  • 246
  • 1
  • 3
0

You could scrape the site, and if you get a code 200 response including your script just use that scrape. If not you may resolve to information from your "client proxy", that way the problem is down to the sites that you can't scrape.

For raising the security in these cases you could have multiple users sending the page and filter out any information that is not present on a minimum number of the responses. That will also have the added benefit of filtering out any user specific content. Also make sure to register what user you ask to do the proxy work and verify that you only receive pages from users that you have asked to do the job. You could also try to make sure that very active users don't get a higher chance of doing the job, that will make it harder to "fish" for the job.

aaaaaaaaaaaa
  • 3,630
  • 1
  • 24
  • 23
-1

How about:

Site A creates a nonce (basically a random string), sends it to your site B that puts it into the session. Then when the site A makes the POST request from the site it sends the nonce along with the request and the request is only accepted if the nonce matches the one in the site B's session.

Tower
  • 98,741
  • 129
  • 357
  • 507
  • Thanks, Kai, but I don't believe that solves the problem - it only makes the system more of a pain to hack (which still has some value). A malicious party could still forge the requests required here. The fundamental problem is that client-side code is not hidden, so no secrets can be kept. Once you figure out how the browser is communicating with the server, you can emulate the same, and the server is none the wiser. – Bungle Apr 03 '10 at 22:46