The general gist of what you need to do boils down to:
- Identify the urls you need to parse. In your case you'll notice that the results are loaded via ajax. Right-click the page, click 'inspect element' and go to the network tab. You'll see that the actual url is:
http://steamcommunity.com/market/search/render/?query=&start=<STARTVALUE>&count=<NUMBEROFRESULTS>&search_descriptions=0&sort_column=quantity&sort_dir=desc&appid=730&category_730_ItemSet%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Pistol&category_730_Type%5B%5D=tag_CSGO_Type_SMG&category_730_Type%5B%5D=tag_CSGO_Type_Rifle&category_730_Type%5B%5D=tag_CSGO_Type_SniperRifle&category_730_Type%5B%5D=tag_CSGO_Type_Shotgun&category_730_Type%5B%5D=tag_CSGO_Type_Machinegun&category_730_Type%5B%5D=tag_CSGO_Type_Knife
- Identify what the response type is. In this case it is json, and the data we want is inside a html-snippet
- Find the framework required to parse it. You can use
json_decode(...)
to decode the json string. This question will give more information how to parse html.
- You can now feed these urls to a function that loads the page. You can use
file_get_contents(...)
or the curl library.
- Enter the values you parse from the response into your database. Make sure that the script does not get killed when it runs for too long. This question will give you more information about that.
You can use the following as a framework. You'll have to figure the structure of the html yourself, and lookup a tutorial of the html parser and mysql library you want to use.
<?php
//Prevent this script from being killed. Please note that if this script never
//ends, you'll have to kill it manually
set_time_limit( 0 );
//The api does not allow for more than 100 results at a time
$start = 0;
$count = 100;
$maxresults = PHP_INT_MAX;
$baseurl = "http://steamcommunity.com/market/search/render/?query=&start=$1&count=$2&search_descriptions=0&sort_column=quantity&sort_dir=desc&appid=730&category_730_ItemSet%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Pistol&category_730_Type%5B%5D=tag_CSGO_Type_SMG&category_730_Type%5B%5D=tag_CSGO_Type_Rifle&category_730_Type%5B%5D=tag_CSGO_Type_SniperRifle&category_730_Type%5B%5D=tag_CSGO_Type_Shotgun&category_730_Type%5B%5D=tag_CSGO_Type_Machinegun&category_730_Type%5B%5D=tag_CSGO_Type_Knife";
while( $start < $maxresults ) {
//Constructing the next url
$url = str_replace( "$1", $start, $baseurl );
$url = str_replace( "$2", $count, $url );
//Doing the request
$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, $url );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
$result = json_decode( curl_exec( $ch ), TRUE );
curl_close( $ch );
//Doing things with the result
//
//First let's see if everything went according to plan
if( $result == NULL || $result["success"] !== TRUE ) {
echo "Something went horribly wrong. Please edit the script to take this error into account and rerun it.";
exit( -1 );
}
//Bookkeeping for the next url we have to fetch
$count = $result["pagesize"];
$start += $count;
$maxresults = $result["total_count"];
//This is the html we have to parse
$html = $result["results_html"];
//Look up an example how to parse html, and how to get data from it
//Look up how to make a database connection and how to insert data into
//your database
}
echo "And we are done!";