-3

There is this online form (https://servizi.ivass.it/RuirPubblica/) where you can make a search (just make a blank search). For each result it gives, I need to click on the result and export the list that is in the 5th table of the details page.

So basically I want to make a software that does that for me:

  1. Submit a search with my own criteria
  2. Access each page of the result items
  3. Access each item detail page
  4. Obtain the rows in the 5th tag so that I can append them to a list

Using Fiddler I checked which parameters where used in the POST request when I clicked the "Search" button, and tried to do the same with .Net. If I try to access the base address with HttpClient it returns the correct HTML of the search form, but when I submit the following POST request with search parameters I get a web page showing the error "Warning: Session Expired".

This happens also if I make the search POST call alone, without first accessing the home page, so I'm not sure it is related to keeping the session alibe between two requests.

public MainWindow()
        {
            InitializeComponent();

            var cookieJar = new CookieContainer();
            var handler = new HttpClientHandler
            {
                CookieContainer = cookieJar,
                UseCookies = true,
                UseDefaultCredentials = false
            };

            client = new HttpClient(handler)
            {
                BaseAddress = new Uri("https://servizi.ivass.it/RuirPubblica/Search.faces")
            };
        }

        private async Task TryHttp()
        {
            // Access the search page
            var response = await client.GetAsync(client.BaseAddress);

            var responseString = await response.Content.ReadAsStringAsync();

            // Perform the search
            var values = new Dictionary<string, string>
            {
                { "FormSearch", "FormSearch" },
                { "FormSearch:j_id_jsp_558348152_13", "PG" },
                { "FormSearch:j_id_jsp_558348152_16", "custom" },
                { "FormSearch:SecE", "on" },
                { "FormSearch:matricola", "" },
                { "FormSearch:ragioneSociale", "" },
                { "FormSearch:provincia", "NA" },
                { "FormSearch:SearchButton", "Ricerca" },
                { "javax.faces.ViewState", "j_id1:j_id5" },
            };

            var content = new FormUrlEncodedContent(values);

            response = await client.PostAsync(client.BaseAddress, content);

            // Here I'm getting a web page showing the error "Warning: Session expired"
            responseString = await response.Content.ReadAsStringAsync();
        }

        private void ButtonBase_OnClick(object sender, RoutedEventArgs e)
        {
            TryHttp();
        }
  • 1
    Stack Overflow is not the best place to ask "how do I get started with...." questions. You need to do your own research and ask questions here *after* you have tried on your own. Please read [ask] – Camilo Terevinto May 30 '18 at 11:42
  • You could get started by using a proxy like Fiddler and watching the actual requests that are made. The you can look at replicating those in C#. Once you can get the pages, you can use something like the [HtmlAgilityPack](https://www.nuget.org/packages/HtmlAgilityPack/) to parse the DOM and allow you extract the values. If you get stuck on a specific step, by all means ask another question - this one is too borad. – stuartd May 30 '18 at 11:45
  • Thank you @Camilo. I will start trying some code on my own, yet it's a wide matter and if someone more expert could give a quick look to the website and just tell me if i what I need is not doable, that would save me useless wasting of time. Thankyou for editing my question too. – user3420936 May 30 '18 at 11:49
  • Thank you @stuartd for your great advice! Fiddler looks like a promising tool to get started with! – user3420936 May 30 '18 at 11:50
  • Just added some more details after my first test with C#. For some reason I can't get the search request to return values. Not sure if it's because of some session management I'm not performing, or because I'm using wrong parameters for the POST call – user3420936 May 31 '18 at 09:38

1 Answers1

0

If you can define it, it can be done. As you will understand from the comments StackOverflow is all about programming questions, so I will only help you with that.

In principal if the web page is "parsable" as HTML and communicates using HTTP you can do anything with it that a normal web browser would do. The website you reference does initially appear to do anything out of the ordinary.

HTMLAgilityPack can be very useful for parsing the DOM and navigating and extracting the contents.

To make HTTP requests with C# you should use the HttpClient class.

There are older options like the HttpWebClient, there is good answer here on SO to help you decide between the two.


For quick reference, Fiddler is available here, I too have used it many times and would recommended it, although it can cause problems with HTTPS traffic and debugging.

Jodrell
  • 34,946
  • 5
  • 87
  • 124