-4

I'm running a LAMP web server.

I'd like to include my script files on my page with:

<script src="http://domain.com/script.js"></script>

I would like visiting http://domain.com/script.js to display either an error, or a blank page.

I've seen other similar questions, of which the answer was "just obfuscate it", or "security by obfuscation is bad".

This isn't for the sake of security. I'm wanting to stop bots from pulling my code automatically. I'm ok with human users getting the code. I would simply like this as an alternative to obfuscation.

I've already attempted this with the use of base64_encoded $_GET and $_SESSION parameters. I'm wondering if there's a more elegant solution out there.


CLARIFICATION:

I am aware that Javascript is still available to the user. I am perfectly fine with the code being accessible via Firebug, Chrome's developer tools, etc. I simply want the code accessible via my tags, and inaccessible directly. This is not for security, and not to "hide" my code.


Clarification 2:

The reason I need this is because our company recently found a competitor running scripts to scrape data off of our site. I would like to be able to prevent the data from being scraped via their script, and force them to do it manually.

Community
  • 1
  • 1
jperezov
  • 3,041
  • 1
  • 20
  • 36
  • 4
    If you feel you must do this, look through these questions on preventing hotlinking of image files and apply the same methodology to .js. http://stackoverflow.com/search?q=htaccess+prevent+hotlink These methods are easily defeated by forging referer headers though. – Michael Berkowski Nov 07 '14 at 14:18
  • possible duplicate of [How can I block direct access to my JavaScript files?](http://stackoverflow.com/questions/6335644/how-can-i-block-direct-access-to-my-javascript-files) – Mark Nov 07 '14 at 14:18
  • That's rather rude. Just because you don't see a use for this, doesn't mean it doesn't exist. – jperezov Nov 07 '14 at 14:18
  • 3
    I don't think you really understand the architecture of how Web Servers serve up content, but what you are asking for is not really possible. – Mark Nov 07 '14 at 14:19
  • 1
    No matter what you do, the JavaScript will still be available to the user. – j08691 Nov 07 '14 at 14:20
  • 1
    @Mark That was a different question--I would like the _same_ file that is included via the – jperezov Nov 07 '14 at 14:20
  • 1
    If the browser can see it the user can see it. – I wrestled a bear once. Nov 07 '14 at 14:21
  • 1
    If you actually read through the answers, they more than address what you want (as not possible) and also provide some useful alternatives to get you closer to your completely unnecessary goal. – Mark Nov 07 '14 at 14:21
  • 1
    @Adelphia Wow, I was just going to post the exact same words... – jeroen Nov 07 '14 at 14:21
  • @j08691 My question isn't asking how to prevent the user from accessing my javascript. They can easily just use something like Firebug--and I'm fine with this. I'm asking, quite specifically, how to make the file accessible via my – jperezov Nov 07 '14 at 14:21
  • @user3191820 Which shows a clear lack of understanding at how content is served up to the client from the server, which is why I am now downvoting your question. – Mark Nov 07 '14 at 14:22
  • 1
    "My question isn't asking how to prevent the user from accessing my javascript" ... "how to make the file ... inaccessible directly". I tend to preserve benefit of the doubt, but this is very slowly approaching trolling... – Katana314 Nov 07 '14 at 14:23
  • 1
    _"I'm asking, quite specifically, how to make the file accessible via my – j08691 Nov 07 '14 at 14:23
  • @user3191820 **why** do you want to do this? Michael Berkowski's link meets your requirements, but i still cant see a use case – Steve Nov 07 '14 at 14:24
  • *"I'm asking, quite specifically, how to make the file accessible via my – T.J. Crowder Nov 07 '14 at 14:25
  • Do you want leech protection on your script? – Tibor B. Nov 07 '14 at 14:25
  • 3
    Having just seen clarification 2, i would say that you are probably best following a different approach. Scrapers are quite easy to identify, because they hit all your endpoints very quickly. Better to identify the offending IPs, and either block them entirely, or if you are feeling devious send them old/incorrect data. Other suggestions are either useless (a scraper will probably request the html before the js, like a regular browser, it might even **be** an automated browser) or inconvenience your real users (captchas are a pain). – Steve Nov 07 '14 at 19:57

5 Answers5

2

I'm asking, quite specifically, how to make the file accessible via my <script> tag, and inaccessible directly.

Two options come to mind:

"Prevent hotlinking" Solutions

As @MichaelBerkowski pointed out, this is very similar to the common requirement of not allowing hotlinking of images, and the same sorts of solutions apply, with the same caveats and pitfalls. Basically, it's either of the following or both in combination:

  1. Checking REFERER (sic) headers on requests for your JavaScript files and denying those requests if REFERER doesn't refer to one of your pages.

  2. Remembering the IP addresses of machines that request your HTML pages for a brief time (say, up to a minute), and only allowing those IP addresses to download the JavaScript files, denying attempts from all other IP addresses.

The first is trivial, but also trivially bypassed. The second is a lot less trivial, but also readily bypassed (by simply issuing a request for the HTML and then disregarding the result), but does at least require that the request be made.

Embedding the JavaScript in the HTML

An alternative to doing that is to use an Apache module to minify the script and inject it into your HTML file at the point you have your <script src="myfile.js"></script> tag, resulting in a <script>codehere</script> tag instead. Then there's no JavaScript file to request. This has the downside of meaning that the same JavaScript on multiple pages doesn't benefit from caching, but then again it has the upsides of A) Not requiring a separate HTTP request, and B) Making it impossible for people to download your JavaScript files (as you'd simply not host them externally-visible at all).


Neither of the above means people can't get access to your code, because fundamentally that's impossible (the best you can do is obfuscate, and de-obfuscators are pretty good; fundamentally if the browser can run your script, anyone can see it), but it's clear from the comments on the question that you understand that.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • @Mark: Indeed not. I think the comments made that abundantly clear, I figured the answer didn't benefit from going into it further. – T.J. Crowder Nov 07 '14 at 14:33
  • Thank you for actually answering my question seriously. This unfortunately doesn't solve my issue though--I've already got a test script that prints out the `$_SERVER` superglobal. Whether it's included, or accessed directly, the values are identical. – jperezov Nov 07 '14 at 14:33
  • 2
    @user3191820: What does the `$_SERVER` superglobal have to do with it? I'm talking about the `REFERER` header, and/or an in-memory DB of IP addresses, and/or embedding the script *in* the HTML. – T.J. Crowder Nov 07 '14 at 14:35
  • @T.J.Crowder the `$_SERVER` superglobal contains the `HTTP_REFERER` and the `REMOTE_ADDR` (IP address) values. They were identical in both cases (values being 'localhost' and '127.0.0.1', respectively, on my local machine). – jperezov Nov 07 '14 at 14:40
  • 1
    @user3191820: That's your local machine. Try it in a real production environment. A hotlink to your script file will have a different REFERER, and the REMOTE_ADDR should (provided all the layers are set up correctly) have the remote IP, not your local one. – T.J. Crowder Nov 07 '14 at 14:41
  • @T.J.Crowder You are correct, sir. I opted for a different solution though, as the amount of code it takes to bypass this is minimal. – jperezov Nov 07 '14 at 20:17
2

With your clarification #2 in mind, you might consider using PHP sessions.

You could first have the user hit a page that requires a captcha to proceed. Once the captcha is submitted and verified, a PHP session is started (or updated) with a boolean $isHuman that shows you are indeed dealing with a human.

Requests for scripts are directed to a php page that serves the script only if a session exists and $isHuman is true.

Jonathan M
  • 17,145
  • 9
  • 58
  • 91
  • 1
    This is much more likely to stop Data Scraping then what you (OP) are actually asking for. – Mark Nov 07 '14 at 14:40
1

As several people have tried to explain in the comments, this isn't really possible because the server how no way to know whether a JS file is being requested as part of the HTML page or on it's own.

The closest you are going to get to achieving this is by creating a random string and appending it to your script when the HTML is generated and checking for that string when the JS is called.

This is how CAPTCHA's work, BTW.

In you HTML

<?php
session_start(); //start session

// Function to generate random str, borrowed from here: http://stackoverflow.com/questions/4356289/php-random-string-generator
function randStr() {
    $characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
    $randomString = '';
    for ($i = 0; $i < 10; $i++) $randomString .= $characters[rand(0, strlen($characters) - 1)];
    return $randomString;
}

// set random str to session variable
$_SESSION['JS_STR'] = randStr();

// append random string to JS file, which will have to have a php extention
echo "<script src='myjavascriptfile.php?str=".$_SESSION['JS_STR']."' />";

?>

In you JS

You will have to change your .js file to .php

<?php
session_start();

//check make sure session variable matches the appended string
if(!isset($_SESSION['JS_STR']) || !isset($_GET['str']) || $_GET['str'] !== $_SESSION['JS_STR']) die("you don't have permission to view this");

//tell the browser your serving some JS
header('Content-Type: application/javascript');

?>
window.alert("your JS goes here...");
I wrestled a bear once.
  • 22,983
  • 19
  • 69
  • 116
1

I opted to just pursue the $_SESSION/$_GET/$_POST-gated script I had started before visiting StackOverflow.

The solution's not perfect, but it suits my needs, in that the scripts are accessible via my tags, but inaccessible directly. This is a simplified version of what I am doing:

File 1 is the PHP file generating the HTML page the user sees. This file creates a random value, and sets the value to the session. The script file File 2 is included using this random value as a GET parameter.

File 1:

<?php
session_start();
$gate['first_gate'] = crypt((time() * md_rand()) . 'salt');
$gate['second_gate'] = null;
$_SESSION['gate'] = json_encode($gate);
?>
<html>
    ...
    <!--this is just the HTML page including the script-->
    <script src="file_2.php?gate=<?=base64_encode(json_encode($gate))?>"></script>
    ...
</html>

File 2 is the PHP file functioning as a gate for the actual JavaScript code. It verifies that the randomized session variable is equal to the GET parameter, then grabs the code from File 3 using a POST request.

File 2:

 <?php
 $session_gate = json_decode($_SESSION['gate']);
 $get_gate = json_decode(base64_decode($_GET['gate']));
 //Exit if the session value != the get value
 if($get_gate->first_gate != $session_gate->first_gate) exit;

 //Set first gate to null to prevent re-visit
 $session_gate->first_gate = null;
 $session_gate->second_gate = crypt((time() * md_rand()) . 'salt');
 $_SESSION['gate'] = json_encode($session_gate);
 header('Content-Type: application/javascript');
 ?>
 //This is visible via "view source" (then clicking on the script's URL)
 //Grab the actual JS file, hidden behind a POST "wall"
 $.post("file_3.php", { gate: '<?=base64_encode($_SESSION['gate'])?>' });

File 3 is inaccessible when directly viewing the page, as it exits without the POST data from File 2. Bots will still be able to ping it with a POST request, so some additional safety measures should be added here.

File 3:

 <?php
 $session_gate = json_decode($_SESSION['gate']);
 $post_gate = json_decode(base64_decode($_POST['gate']));
 //Exit without a POST request. Use a more specific value, other than
 //the $_POST superglobal by itself (just using $_POST for illustrative purposes)
 if(!$_POST) exit; //or print an error message
 //Exit if the session value != the get value
 if($get_gate->second_gate != $session_gate->second_gate) exit;

 //Set both gates to null to prevent re-visit
 $session_gate->first_gate = null;
 $session_gate->second_gate = null;
 $_SESSION['gate'] = json_encode($session_gate);
 //Additional safety measures (such as IP address/HOST check) here, if desired
 header('Content-Type: application/javascript');
 ?>
 //Javascript code here
jperezov
  • 3,041
  • 1
  • 20
  • 36
  • +1 for answering your own question. This could be simplified and improved however - see my answer – Steve Nov 07 '14 at 22:02
1

Following your answer, this is a simplified version of your solution:

<?php
//file1
session_start();
$token = uniqid();
$_SESSION['token'] = $token;
?>
<!--page html here-->
<script src="/js.php?t=<?php echo $token;?>"></script>

.

header('Content-Type: application/javascript');
$token = isset($_GET['t'])? $_GET['t'] : null;
if(!isset($_SESSION['token']) || $_SESSION['token'] != $token){
    //lets mess with them and inject some random js, ih this case a random chunk of compressed jquery
    die('n=function(a,b){return new n.fn.init(a,b)},o=/^[\s\uFEFF\xA0]+|[\s\uFEFF\xA0]+$/g,p=/^-ms-/,q=/-([\da-z])/gi,r=function(a,b){return b.toUpperCase()};n.fn=n.prototype={jquery:m,constructor:n,selector:"",length:0,toArray:function(){return d.call(this)},get:function(a){return null!=a?0>a?this[a+this.length]:this[a]:d.call(this)},pushStack:function(a){var b=n.merge(this.constructor(),a);return b.prevObject=this,b.context=this.context,b},each:function(a,b){return n.each(this,a,b)},map:function(a){return this.pushStack(n.map(this,function(b,c){return a.call(b,c,b)}))},slice:function(){return this.pushStack(d.apply(this,arguments))},first:function(){return this.eq(0)},last:function(){return this.eq(-1)},eq:function(a){var b=this.length,c=+a+(0>a?b:0);return this.pushStack(c>=0&&b>c?[this[c]]:[])},end:function(){return this.prevObject||this.constructor(null)}');
}
//regenerate token, this invalidates current token
$token = uniqid();
$_SESSION['token'] = $token;
?>
$.getScript('js2.php?t=<?php echo $token;?>');

.

<?php
//js2.php
//much the same as before
session_start();
header('Content-Type: application/javascript');
$token = isset($_GET['t'])? $_GET['t'] : null;
if(!isset($_SESSION['token']) || $_SESSION['token'] != $token){
    //lets mess with them and inject some random js, ih this case a random chunk of compressed jquery
    die('n=function(a,b){return new n.fn.init(a,b)},o=/^[\s\uFEFF\xA0]+|[\s\uFEFF\xA0]+$/g,p=/^-ms-/,q=/-([\da-z])/gi,r=function(a,b){return b.toUpperCase()};n.fn=n.prototype={jquery:m,constructor:n,selector:"",length:0,toArray:function(){return d.call(this)},get:function(a){return null!=a?0>a?this[a+this.length]:this[a]:d.call(this)},pushStack:function(a){var b=n.merge(this.constructor(),a);return b.prevObject=this,b.context=this.context,b},each:function(a,b){return n.each(this,a,b)},map:function(a){return this.pushStack(n.map(this,function(b,c){return a.call(b,c,b)}))},slice:function(){return this.pushStack(d.apply(this,arguments))},first:function(){return this.eq(0)},last:function(){return this.eq(-1)},eq:function(a){var b=this.length,c=+a+(0>a?b:0);return this.pushStack(c>=0&&b>c?[this[c]]:[])},end:function(){return this.prevObject||this.constructor(null)}');
}
unset($_SESSION['token']);
//get actual js file, from a folder outside of webroot, so it is never directly accessable, even if the filename is known
readfile('../js/main.js');

Note the main changes are:

  1. Simplifying the token system. As the token is in the page source, all it needs to do to function is to be unique, attempts to make it 'more secure' with encoding and salts etc do nothing.

  2. The actual js file is saved outside the web root, so its not possable to access directly even if you know the filename

Please note that i still stand by my comment about IP banning bots. This solution will make scraping a lot harder, but not impossible, and could have unforeseen consequences for genuine visitors.

Steve
  • 20,703
  • 5
  • 41
  • 67