I am trying to scrape the web page with C# and I am using HtmlAgilityPack it works good for me, but I got an issue with this website when I need to scrape data from another page of product list. Because link doesn't have page number so I cannot access it by changing link. I found out that page is changed by javascript "__doPostBack" function which doesn't changes the link, just reloads the page, and loads the data.
This is my code for scraping code and price of the product in this web site, however there are more products in other page e.g 2, 3, 4, 5... I need to scrape data from all of these. On other websites I can do just simply passing link to web.Load("Link"); and it works well because link is changing when you change page of product list. In this example link is not changing when other page of the list is selected.
public class CodeAndPrice
{
public string Code { get; set; }
public string Price { get; set; }
}
public partial class Form1 : Form
{
DataTable table;
HtmlWeb web = new HtmlWeb();
public Form1()
{
InitializeComponent();
InitTable();
}
private void InitTable()
{
table = new DataTable("DataTableTest");
table.Columns.Add("Code", typeof(string));
table.Columns.Add("Price", typeof(string));
dataGridView.DataSource = table;
}
private async Task<List<CodeAndPrice>> DataScraping (){
var page = await Task.Factory.StartNew(() => web.Load("https://www.kilobaitas.lt/Kompiuteriai/Plansetiniai_(Tablet)/CatalogStore.aspx?CatID=PL_626"));
var codesNodes = page.DocumentNode.SelectNodes("//td[@class='mainContent']//div[@class='itemNormal']//div[@class='itemCode']");
var pricesNodes = page.DocumentNode.SelectNodes("//td[@class='mainContent']//div[@class='itemNormal']//div[@class='itemCode']//parent::div//div[@class='itemBoxPrice']");
if (codesNodes == null || pricesNodes == null)
return new List<CodeAndPrice>();
var codes = codesNodes.Select(node => node.InnerText.Replace("kodas", "").Replace(" ", "").Replace(": ", ""));
var prices = pricesNodes.Select(node => node.InnerText.Replace(" ", "").Replace(" €", ""));
return codes.Zip(prices, (code,price)=> new CodeAndPrice() { Code = code, Price = price }).ToList();
}
private async void Form1_Load(object sender, EventArgs e)
{
var results = await DataScraping();
foreach (var rez in results) {
table.Rows.Add(rez.Code, rez.Price);
}
}
}
Passing __doPostBack('designer1$ctl11$ctl00$MainCatalogSquare1$XDataPaging1','paging.1'); into the browser's console, page 2 is loaded, by changing "paging.*", browser loads page *+1
What is the simplest way to manipulate javascript, that I will be able to change page while scraping data and scrape data from other pages of this website?