-2

Yesterday I went for an interview and they asked me 2 questions that I was unable to answer. Could anybody point me to the right direction on the following please?

  1. Ways to prevent a website from web scraping.
  2. How to prevent SQL injection in a website?
user123456
  • 95
  • 2
  • 15
  • 5
    Can we get a bit more context? And I think you mean "prevent SQL injection". – Joel A. Villarreal Bertoldi Apr 07 '16 at 18:10
  • @David wow that's so wrong – cmorrissey Apr 07 '16 at 18:12
  • 2
    @cmorrissey: I wish it were. But just browse the PHP tag here on Stack Overflow. It's overwhelming. – David Apr 07 '16 at 18:12
  • @David It makes no sense, codes here are meant to go straight to the issue pointed by the OP, it's useless to add code for security purpose that will make the solution less clear. – AntoineB Apr 07 '16 at 18:16
  • 1
    @AntoineB: Solving a problem while introducing another problem is hardly helpful. – David Apr 07 '16 at 18:17
  • 1
    @AntoineB The point being that if you structure your code in such a way that *allows* SQL injection, you're basically adding it yourself. Using prepared statements with the right charset is really the only proper way of protecting yourself against it, and most guides (and a lot of answers here on SO) don't include that - instead there are user-inputs directly in the query, unsanitized. – Qirel Apr 07 '16 at 18:19
  • @David so I go to a website and I copy the code `echo 'test';` and paste it some where that's an SQL injection? No, that's my point. – cmorrissey Apr 07 '16 at 18:27
  • @cmorrissey I think he was referring to code that actually does some sort of database interaction... There's quite a lot of `INSERT into db ... $_GET["whatever"]` code on SO. I guess people just get tired of pointing it out after a while. That, and as stated elsewhere, maybe it blurs the issue if the OP is asking about something else. – jDo Apr 07 '16 at 19:03

4 Answers4

2
  1. Scraping is typically avoided by using a robots.txt file.
  2. They most likely asked about preventing SQL Injection, rather than adding it. This is done through input sanitization.
Community
  • 1
  • 1
  • 3
    Input sanitization.is probably one of the worst ways to prevent SQL injection. – Ed Heal Apr 07 '16 at 18:13
  • 2
    Preventing SQL injection is about *not executing user input as code*. You don't have to "sanitize" inputs if you never execute them in the first place. – David Apr 07 '16 at 18:14
  • 2
    `robots.txt` only works if the other end is being nice. – Ed Heal Apr 07 '16 at 18:19
  • Ed Heal is spot on because only good bots follow the rules in a robots.txt file. Bad bots ignore the rules in a robots.txt file. A robots.txt file is not a security measure and is instead an SEO tool. – Ed-AITpro Apr 07 '16 at 19:01
1

Here are a few things you can do to prevent crawling/scraping

  1. You can do some basic HTTP header validations.
  2. You can use some 3rd part tools
  3. You can use JS rendered/dynamic content, which can add a layer of difficulty
  4. You can user things like logins and restrict access to certain areas
  5. -You can use robots.txt file to control search crawlers
  6. -You can also decorate your hyperlinks with the rel="_nofollow" attribute

For SQL injection protection

  1. -You can try to have levels of extraction from your DB(n-tiered applicatin) where the actual web application will not directly interact with the DB.
  2. Properly sanitize, encode and handle all user input
  3. Do no rely on your own validation and sanitation, use the tools that have put together by dev teams
  4. Use unit testing in your application, make sure your application can handle all types of input, and fails safe
  5. Ensure you are not throwing verbose error messages directly from the database
Mark
  • 4,773
  • 8
  • 53
  • 91
0
  1. To prevent scraping the best way is to make the web pages differ in some ways that are not predictable each time they are loaded. Each page using unique identifiers for tags etc.
  2. Do not include the inputs into dynamic SQL. Use binding instead.
Ed Heal
  • 59,252
  • 17
  • 87
  • 127
0

You can use htaccess files/code to block bad bots/scrapers and protect against SQL Injection.

  1. See this link: http://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess.html for an extensive list of bad bots/scrapers and the htaccess code to use.

2.

    # SQL Injection Protection: Block common SQL commands used in Query Strings
    RewriteCond %{QUERY_STRING} (;|<|>|'|"|\)|%0A|%0D|%22|%27|%3C|%3E|%00).*(/\*|union|select|insert|drop|delete|update|cast|create|char|convert|alter|declare|order|script|set|md5|benchmark|encode) [NC]
    RewriteRule ^(.*)$ - [F]
Ed-AITpro
  • 310
  • 1
  • 8