0

I had a problem scraping data from a web page which I got a solution Scrape data from web page that using iframe c#

My problem is that they changed the webpage which is now https://webportal.thpa.gr/ctreport/container/track and I don't think that is using iFrames and I cannot get any data back.

Can someone tell me if I can use the same method to get data from this webpage or should I use a different aproach?

I don't know how @coder_b found that I should use https://portal.thpa.gr/fnet5/track/index.php as web page and that I should use

 var reqUrlContent =
         hc.PostAsync(url,
        new StringContent($"d=1&containerCode={reference}&go=1", Encoding.UTF8,
        "application/x-www-form-urlencoded"))
        .Result;

to pass the variables

EDIT: When I check the webpage there is an input which contains the number

input type="text" id="report_container_containerno" name="report_container[containerno]" required="required" class="form-control" minlength="11" maxlength="11" placeholder="E/K για αναζήτηση" value="ARKU2215462" Can I use something to pass with HtmlAgilityPack and then it should be easy to read the result

Also when I check the DocumentNode it seems to show me the cookies page that I should agree. Can I bypass or auto allow cookies?

rippergr
  • 182
  • 2
  • 20
  • You can use CefSharp, add a browser to your application and work mostly in client side. You can fill controls, navigate and get data from JavaScript, and then send to your application. Look for CefSharp examples. – Victor May 26 '22 at 12:06
  • @Victor I thought of that too but I think cefsharp is a little bit slow. When I get data from this page it could be 10 items or more and I must call it separately for every item that I search – rippergr May 26 '22 at 12:11
  • I use CefSharp only when is not possible do a direct request to get the HTML (when page changes dinamically like scroll infinite to load the page). Do you have a test value for the page? – Victor May 26 '22 at 12:53
  • @Victor yes you can use ARKU2215462 as container number to get you informations. – rippergr May 26 '22 at 12:57

1 Answers1

1

Try this:

public static string Download(string search)
{
    var request = (HttpWebRequest)WebRequest.Create("https://webportal.thpa.gr/ctreport/container/track");

    var postData = string.Format("report_container%5Bcontainerno%5D={0}&report_container%5Bsearch%5D=", search);
    var data = Encoding.ASCII.GetBytes(postData);

    request.Method = "POST";
    request.ContentType = "application/x-www-form-urlencoded";
    request.ContentLength = data.Length;

    using (var stream = request.GetRequestStream())
    {
        stream.Write(data, 0, data.Length);
    }

    using (var response = (HttpWebResponse)request.GetResponse())
    using (var stream = new StreamReader(response.GetResponseStream()))
    {
        return stream.ReadToEnd();
    }
}

Usage:

var html = Download("ARKU2215462");

UPDATE

To find the post parameters to use, press F12 in the browser to show dev tools, then select Network tab. Now, fill the search input with your ARKU2215462 and press the button.

That do a request to the server to get the response. In that request, you can inspect both request and response. There are lots of request (styles, scripts, iamges...) but you want the html pages. In this case, look this:

Analyze request

This is the Form data requested. If you click in "view source", you get the data encoded like "report_container%5Bcontainerno%5D=ARKU2215462&report_container%5Bsearch%5D=", as you need in your code.

Victor
  • 2,313
  • 2
  • 5
  • 13
  • Great. That seems to bring all the information inside the html variable. I will try to filter the information to get the data that I need. Thanks for your help. – rippergr May 26 '22 at 15:51
  • Can you please explain me how you found the postaData parameters ? – rippergr Jun 01 '22 at 07:45