9

Is it possible to ban certain user agents directly from web.config? Certain robots seem not to follow robots.txt, and to avoid pointless server load (and log-file spamming) I'd like to prevent certain classes of request (in particular based on user-agent or very perhaps IP-address) from proceeding.

Bonus points if you know if it's similarly possible to prevent such requests from being logged to IIS's log-file entirely. (i.e. if-request-match, forward to /dev/null, if you get my meaning).

A solution for win2003 would be preferable, but this is a recurring problem - if there's a clean solution for IIS7 but not IIS6, I'd be happy to know it.

Edit: Sorry 'bout the incomplete question earlier, I had tab+entered accidentally.

Eamon Nerbonne
  • 47,023
  • 20
  • 101
  • 166

3 Answers3

13

This can be done pretty easily using the URLRewrite module in IIS7. But I really don't know if this will prevent those requests from being logged.

 <rewrite> 
  <rules> 
    <rule name="Ban user-agent RogueBot" stopProcessing="true"> 
      <match url=".*" /> 
      <conditions> 
        <add input="{HTTP_USER_AGENT}" pattern="RogueBotName" /> 
        <add input="{MyPrivatePages:{REQUEST_URI}}" pattern="(.+)" /> 
      </conditions> 
      <action type="AbortRequest" /> 
    </rule> 
  </rules> 
  <rewriteMaps> 
    <rewriteMap name="MyPrivatePages"> 
      <add key="/PrivatePage1.aspx" value="block" /> 
      <add key="/PrivatePage2.aspx" value="block" />
      <add key="/PrivatePage3.aspx" value="block" /> 
    </rewriteMap> 
  </rewriteMaps> 
</rewrite>
Albert Walker
  • 321
  • 2
  • 7
  • Well, the site is small enough such that the IIS log isn't a perf. problem; it's mostly just noise I wouldn't mind avoiding - but this solution is exactly what I was hoping for - some configurable module that can abort certain requests. I'll look into it, thanks! – Eamon Nerbonne Jul 24 '09 at 08:38
  • Do you know if it's possible to create one rule for multiple bots? – UpTheCreek Feb 11 '11 at 09:14
  • @UpTheCreek It is a pattern, so as long as you have a regular expression that identifies all the bots in question you want to look for, you can plug it in there. I would be careful about that though, it seems slightly dangerous for killing normal traffic if you aren't careful. – Zachary Dow Jul 24 '15 at 15:19
  • @UpTheCreek Modifying the above code like this would cover most cases. Obviously test thoroughly before putting it anywhere live: – Zachary Dow Jul 24 '15 at 15:25
4

You could write a custom ASP.Net HttpModule as I did for my site to ban some rogue bots. Here's the code:

public class UserAgentBasedRedirecter : IHttpModule
{
    private static readonly Regex _bannedUserAgentsRegex = null;
    private static readonly string _bannedAgentsRedirectUrl = null;

    static UserAgentBasedRedirecter()
    {
            _bannedAgentsRedirectUrl = ConfigurationManager.AppSettings["UserAgentBasedRedirecter.RedirectUrl"];
            if (String.IsNullOrEmpty(_bannedAgentsRedirectUrl))
                _bannedAgentsRedirectUrl = "~/Does/Not/Exist.html";

            string regex = ConfigurationManager.AppSettings["UserAgentBasedRedirecter.UserAgentsRegex"];
            if (!String.IsNullOrEmpty(regex))
                _bannedUserAgentsRegex = new Regex(regex, RegexOptions.IgnoreCase | RegexOptions.Compiled);
    }

    #region Implementation of IHttpModule

    public void Init(HttpApplication context)
    {
            context.PreRequestHandlerExecute += RedirectMatchedUserAgents;
    }

    private static void RedirectMatchedUserAgents(object sender, System.EventArgs e)
    {
            HttpApplication app = sender as HttpApplication;

            if (_bannedUserAgentsRegex != null &&
                app != null && app.Request != null && !String.IsNullOrEmpty(app.Request.UserAgent))
            {
                if (_bannedUserAgentsRegex.Match(app.Request.UserAgent).Success)
                {
                    app.Response.Redirect(_bannedAgentsRedirectUrl);
                }
            }
    }

    public void Dispose()
    { }

    #endregion
}

You'll need to register it in web.config and specify the regular expression to use to match user agent strings. Here's one I used to ban msnbot/1.1 traffic:

<configuration> 
    <appSettings>
        <add key="UserAgentBasedRedirecter.UserAgentsRegex" value="^msnbot/1.1" />
    </appSettings>
...
    <system.web>
        <httpModules>
            <add name="UserAgentBasedRedirecter" type="Andies.Web.Traffic.UserAgentBasedRedirecter, Andies.Web" />
        </httpModules>
    </system.web>
</configuration>
emertechie
  • 3,607
  • 2
  • 22
  • 22
  • This looks even more like what I was looking for :-) thanks! Do you happen to know if this prevents requests from being logged? Probably not, right? – Eamon Nerbonne Nov 30 '09 at 08:32
  • 1
    Haven't checked, but I would imagine that seeing as this has already gone through the ASP.Net pipeline, it's already in the logs – emertechie Nov 30 '09 at 11:19
-2

Don't think you can do this from web.config (authorisation in web.config is for users, not bots). Your best bet would be some kind of custom ISAPI filter for IIS itself. There's a blog about this here. Good luck!

Dan Diplo
  • 25,076
  • 4
  • 67
  • 89