0

I have a system that basically copies data from other databases to my own SQL Server so I can format the data to suit my needs. That data is used to populate dropdowns to end up with a variable can return the rows I want to show to the user.

All of the databases I've encountered up to now had a logical way the data of the dropdowns was stored. But now I have one that is different and not logical.

What I want to do is use the existing website that provides a somewhat logical dropdown itself to get the data I need. That dropdown fills a multiple selection dropdown after a selection.

I want to select each option in the first dropdown and then read the data it puts in the second multiple selection dropdown. Is there a way to do this?

GMBrian
  • 917
  • 1
  • 10
  • 16

1 Answers1

0

What you essentially want to do is web scraping. In a similar project, this is what I did:

  1. Use the available objects in .NET that can send http requests and can receive http responses. In my case, I used WebClient:

        byte[] response;
        using (var client = new WebClient())
        {
            try
            {
                response = client.UploadValues(URL, "POST", postData);
            }
            catch (Exception ex)
            {
                // Log error here                  
            }
        }
    
  2. See the postData? This is a NameValueCollection that contains the data that is sent by the page when you do a POST. Use Fiddler or a similar tool to find out what data needs to be included. If the page uses .NET web forms, you may have to do something special for the __VIEWSTATE, like do a sample request and get the __VIEWSTATE value from there. Here's a sample for postData:

        var postData = new NameValueCollection();
        postData.Add("__EVENTTARGET",  "");
        postData.Add("__EVENTARGUMENT", "");
        postData.Add("__VIEWSTATE", "");
    
  3. Use HtmlAgilityPack to easily "parse" the response that you get when you post. Using Xpath, you will be able to get the new items in the second dropdown.

        var doc = new HtmlDocument();
        doc.Load(new MemoryStream(requestBytes));
        return doc;
    
rikitikitik
  • 2,414
  • 2
  • 26
  • 37
  • I've got it working now but I keep getting a session timeout. Do you know what could be causing it? – GMBrian Jun 08 '12 at 12:05
  • Where exactly are you getting it? – rikitikitik Jun 09 '12 at 00:01
  • When i send the post with the event target filled. I get response: Your current session has expired. Please click below to begin again on restart. – GMBrian Jun 09 '12 at 10:54
  • Does your post data contain everything that should be sent over? – rikitikitik Jun 09 '12 at 16:14
  • i've got all the data from fiddler. But when i fill the eventtarget it gives that response – GMBrian Jun 10 '12 at 12:57
  • I've taken a look at the site and it looks like cookie information needs to be sent over as well. Take a look at this if you're using a WebClient: http://stackoverflow.com/questions/1777221/using-cookiecontainer-with-webclient-class – rikitikitik Jun 12 '12 at 02:55
  • I still can get it right, what i do is: I do a request to the URL, from the URL is get the VIEWSTATE and the __EVENTVALIDATION. Then i make a new request (WebClient) with the following postdata – GMBrian Jun 18 '12 at 09:20
  • postData.Add("__EVENTTARGET", "LinkBtn_studentsets"); postData.Add("__EVENTARGUMENT", ""); postData.Add("__VIEWSTATE", viewstate); postData.Add("__EVENTVALIDATION", EVENTVALIDATION); postData.Add("tLinkType", "staff"); postData.Add("dlFilter", "DEM/DL"); postData.Add("tWildcard", ""); postData.Add("lbWeeks", "t"); postData.Add("RadioType", "individual;staffbydayurl;staffbydayreport"); – GMBrian Jun 18 '12 at 09:23
  • But the response still geive back all the data drom the second field. Not filtered data. – GMBrian Jun 18 '12 at 09:25
  • Have you tried other values for `dlFilter` and the results are the same? Try `DBSV` and see if that works fine. – rikitikitik Jun 19 '12 at 02:59
  • Yes, but the results are the same, or a error. I tried lots of things, but it wont do the postback. The only element i can change is the Wildcard element, by adding filter to the query. – GMBrian Jun 19 '12 at 08:29
  • What is the dlFilter value when you got the error? What is the error? – rikitikitik Jun 19 '12 at 08:31
  • it redirects to swsError.aspx?aspxerrorpath=%2f2011sm2nl%2fDefault.aspx that is when i set the eventtarget or a wrong dlFilter value. – GMBrian Jun 19 '12 at 08:47
  • Change your `__EVENTTARGET` to `dlFilter` and see how it goes. – rikitikitik Jun 19 '12 at 10:08
  • postData.Add("__EVENTARGUMENT", "LinkBtn_studentsets"); postData.Add("__EVENTTARGET", "dlFilter"); postData.Add("dlFilter", "DBSV"); response is the same error postData.Add("__EVENTARGUMENT", "LinkBtn_studentsets"); postData.Add("dlFilter", "DBSV"); Response: all the fields, not filterd When i do a postback from the site, Fiddler tells me: __EVENTTARGET=dlFilter&__EVENTARGUMENT=&__LASTFOCUS=&__VIEWSTATE=%2FwEPDw,.......&__EVENTVALIDATION=...........%2F&tLinkType=studentsets&dlFilter=DEM%2FDL%2FA&tWildcard=&lbWeeks=t&RadioType=individual%3Bstudentsetbydayurl%3Bstudentsetbydayreport – GMBrian Jun 19 '12 at 10:21
  • That last one was what I was getting as well. That resulted to an error? – rikitikitik Jun 20 '12 at 00:34
  • an error, but when i replace the acsii %2F to / then it returns the full field. When I set the eventtarget to dlFilter it returns the default page, not the schedule page. – GMBrian Jun 20 '12 at 07:24