4

I am trying to scrape a basic asp.net directory website which has paging.

The website has more than 50 pages consisting of up to 10 paging links on any one page.

I'm using fiddler to aid in replicating all the parameters, variables, form fields, cookies etc that are posted using a browser. The only difference I see between two posts is __EVENTVALIDATION value.

Using HttpWebRequest I am always having the same value whilst via browser its changes on each click.

Using HttpWebRequest I am getting the 10 first pages correctly however all the following pages redirect me to the home page. Bellow is post back javascript which is always the same for the links after the first 10 ones.

javascript:__doPostBack('CT_Main_2$gvDirectorySearch$ctl53$ctl00$ctl11','')

Any ideas why __EVENTVALIDATION does not changes with HttpWebRequest?

Aydin
  • 15,016
  • 4
  • 32
  • 42
Jim
  • 2,760
  • 8
  • 42
  • 66
  • possible duplicate http://stackoverflow.com/questions/2449328/how-do-i-scrape-information-off-asp-net-websites-when-paging-and-javascript-links – EugenSunic Apr 19 '15 at 08:17
  • Unfortunately its not the same question. If you notice in my case I don't have :__doPostBack('gvEmployees','Page$2')">2 the Page$2 argument. I believe that the difference between pages is determined via EventvValidation field – Jim Apr 19 '15 at 08:38

1 Answers1

4

From your description, it sounds like an anti-forgery token, an anti-forgery token is used to prevent cross-site request forgery (XSRF) attacks..

For a site to take advantage of anti-forgery tokens, it will typically set a cookie in the client's browsers, and it will expect the very same token as a parameter within the form that is being posted.

To overcome it, you'll need to send the token that is set by the server on the subsequent request, you'll also need to scan the HTML form for the same token and include that as well.


EDIT

So I've dug a little deeper and created an ASP.NET WebForms site and tried to replicate your issue but couldn't... on each request I managed to extract the __EVENTVALIDATION field.

Still, here's my code if you find any of it useful...

void Main()
{
    string eventValidationToken = string.Empty;
    
    var firstResponse = this.Get(@"http://localhost:7428/Account/Login");
    
    firstResponse.FormValues["ctl00$MainContent$Email"] = "email@example.com";
    firstResponse.FormValues["ctl00$MainContent$Password"] = "password";

    string secondRequestPostdata = firstResponse.ToString();
    var secondResponse = this.Post(@"http://localhost:7428/Account/Login", secondRequestPostdata);
    
    Console.WriteLine (firstResponse.FormValues["__EVENTVALIDATION"]);
    Console.WriteLine (secondResponse.FormValues["__EVENTVALIDATION"]);
}


public FormData Get(string uri)
{
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://localhost:7428/Account/Login");
    using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
    using (Stream stream = response.GetResponseStream())
    using (StreamReader reader = new StreamReader(stream))
    {
        return  new FormData(reader.ReadToEnd());
    }
}

public FormData Post(string uri, string postContent)
{
    byte[] formBytes = Encoding.UTF8.GetBytes(postContent);
    
    var request = (HttpWebRequest)WebRequest.Create("http://localhost:7428/Account/Login");
    request.Method = "POST";
    request.ContentType = "application/x-www-form-urlencoded";
    request.ContentLength = formBytes.Length;
    
    using (Stream stream = request.GetRequestStream())
    {
        stream.Write(formBytes, 0, formBytes.Length);
    }
    
    using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
    using (Stream stream = response.GetResponseStream())
    using (StreamReader reader = new StreamReader(stream))
    {
        return new FormData(reader.ReadToEnd());
    }
}

public class FormData
{
    public FormData(string html)
    {
        this.Html = html;
    
        this.FormValues = new Dictionary<string, string>();
        this.FormValues["__EVENTTARGET"]                = this.Extract(@"__EVENTTARGET");
        this.FormValues["__EVENTARGUMENT"]              = this.Extract(@"__EVENTARGUMENT");
        this.FormValues["__VIEWSTATE"]                  = this.Extract(@"__VIEWSTATE");
        this.FormValues["__VIEWSTATEGENERATOR"]         = this.Extract(@"__VIEWSTATEGENERATOR");
        this.FormValues["__EVENTVALIDATION"]            = this.Extract(@"__EVENTVALIDATION");
        this.FormValues["ctl00$MainContent$Email"]      = string.Empty;
        this.FormValues["ctl00$MainContent$Password"]   = string.Empty;
        this.FormValues["ctl00$MainContent$ctl05"]      = "Log in";
    }
    
    public string Html { get; set; }
    
    private string Extract(string id)
    {
        return Regex.Match(this.Html, @"id=""" + id + @""" value=""([^""]*)")
                    .Groups[1]
                    .Value;
    }
    
    public Dictionary<string, string> FormValues { get;set; }
    
    public override string ToString()
    {
        var formData = this.FormValues.Select(form => HttpUtility.UrlEncode(form.Key) + "=" + HttpUtility.UrlEncode(form.Value));
                        
        return string.Join("&", formData);
    }
}
Community
  • 1
  • 1
Aydin
  • 15,016
  • 4
  • 32
  • 42
  • hmm I don't see in fiddler any additional parameter which resembles to anti-forgery token, also I am already getting and passing all cookies with the post. I just don't get it why my Eventvalidation value is not changing. – Jim Apr 19 '15 at 10:20
  • So when you fire off your first `HttpWebRequest`... when observing fiddler is the server not setting a token named `__EVENTVALIDATION` on response? – Aydin Apr 19 '15 at 10:24
  • On First step I do a GET and grab original cookies and tokens which also setup correclty__EVENTVALIDATION. Using that info I am doing Post and get proper results (as described above), despite the fact that my EVENTVALIDATION has the same value during all posts. The problem starts when I hit the 11th page (which were not in the original webpage). After that i am always redirected to homepage. – Jim Apr 19 '15 at 10:37
  • Yes and that's where the problem is, once you send the `HttpWebRequest`, you're not reading the body for a ___new___ token, you need to extract that token and use it on the request that follows it. – Aydin Apr 19 '15 at 11:29
  • Hmm but I am always reading it:) and its always the same this is why I am curious. Also, I have noticed that I am getting passing only one cookie ASP.NET_SessionId=bbbbbbb, however there are few others if I do it via browser which are Cookie: __atuvc=; _utma=; __utmz=.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); – Jim Apr 19 '15 at 11:34
  • Check out my updated answer, hopefully it'll help you – Aydin Apr 19 '15 at 12:24
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/75635/discussion-between-jim-and-aydin-adn). – Jim Apr 19 '15 at 12:37
  • I really appreciate your effort. Perhaps I was not very clear in my description but what I am trying to say is that I also get the EVENTVALIDATION each time, however the issue is is that it is the same each time:) I am not sure if I can contact you to show you exact situation.... really appreciate your effort – Jim Apr 19 '15 at 12:41
  • No probs, I'm in chat atm, we can carry on there :) – Aydin Apr 19 '15 at 12:43
  • I really appreciate your help, a true professional and nice guy thx – Jim Apr 19 '15 at 19:33
  • This is a great little piece of code. Worked perfectly the first time I tried it. – Robert Harvey Oct 23 '15 at 02:29
  • @Jim I tried to scrap of asp.net site pagination, and after first page with 20 pages, I got 1 but expect 21. Did you find solution? – Denis Anisimov Sep 29 '20 at 10:01