I have a comparison website, on which I'm every night updating the feeds from the stores. These are generally not the largest feeds (ie. max. 15.000 rows), but even a CSV feed with only 1000 rows can take ages because of the (many?) SQL-queries I have to do. This is my procedure:
- Every CSV feed comes from a different affiliate network, and so, has a different index and structure. This is why I've programmed a page that first standardize every feed to the right structure (ie. $row['name'] = $col[1] in one feed, but $col[2] in another feed). These 'right' data will be processed by another function CheckProduct();.
- In CheckProduct() I'm first checking if a product already exists. If yes, then I'm updating the standard information: current price, stock, url etc. If no, I'm inserting the product into the database.
But that last point takes so much time, that even one extra query can take another few minutes. I don't think it has to take so long, but I keep looking to this code and looking for SQL-query optimization, but I can not seem to figure out how I can keep this procedure, when I'm bulk or combining CSV rows. I already tried to optimize queries (using COUNT() instead of num_rows, only selecting the columns I need, etc.).
I'm not an expert in programming, but I know there must be a different option. I know you can store things in an array and then loop that, or use LOAD DATA INFILE, but I don't know how to programme that in this case (and still have the same kind of strict checks every row). In other words: how can I optimize this code?
This is the function CheckProduct()
function CheckProduct($SiteID, $StoreID, $FeedID, $Name, $URL, $Description, $EANSKU, $Image, $Brand, $Color, $Price, $CategoryPath, $Stock, $Deliverycosts, $Deliverytime, $Length, $Width, $Depth, $Height, $Material)
{
global $db;
if ( !filter_var($URL, FILTER_VALIDATE_URL) === false && !filter_var($Image, FILTER_VALIDATE_URL) === false && !empty($Name) && !empty($URL) && !empty($Price) && !empty($Image) ) {
$sCountProduct = $db->query("SELECT COUNT(*) as total FROM furniture WHERE name_slug='".CreateSlug(trim($db->real_escape_string($Naam)))."' AND feed_id='".$FeedID."' LIMIT 1");
$fCountProduct = $sCountProduct->fetch_assoc();
if($fCountProduct['total'] == '0') {
$iProduct = $db->query("INSERT INTO furniture (site_id,feed_id,store_id,name,name_slug,affiliate_url,description,ean_sku,image_big,brand,brand_slug,color,color_slug,price,category_path,in_stock,visible,shipping_costs,check_today,last_update,deliverytime,length,width,depth,height,material,material_slug,added) VALUES ('1','".$FeedID."','".$StoreID."','".trim($db->real_escape_string($Name))."','".CreateSlug(trim($db->real_escape_string($Name)))."','".trim($AffiliateURL)."','".$db->real_escape_string(trim($Description))."', '".$db->real_escape_string(trim($EANSKU))."','".trim($Afbeelding)."','".$db->real_escape_string(trim($Brand))."','".CreateSlug($db->real_escape_string(trim($Brand)))."','".$db->real_escape_string(trim($Color))."','".CreateSlug($db->real_escape_string(trim($Color)))."','".$db->real_escape_string(trim($Price))."','".$db->real_escape_string(trim(strtolower($CategoryPath)))."','".$Stock."', '3','".$db->real_escape_string(trim($Deliverycosts))."','1','".time()."','".$db->real_escape_string(trim($Deliverytime))."','".$db->real_escape_string(trim($Length))."','".$db->real_escape_string(trim($Width))."','".$db->real_escape_string(trim($Depth))."','".$db->real_escape_string(trim($Height))."','".$db->real_escape_string(trim($Material))."','".CreateSlug($db->real_escape_string(trim($Material)))."','".date('d-m-Y')."')");
$IDProduct = $db->insert_id;
if($iProduct) {
$dOthers = $vm->query("DELETE FROM furniture WHERE name_slug='".CreateSlug(trim($db->real_escape_string($Name)))."' AND id != '".$IDProduct."' AND feed_id='".$FeedID."' AND visible != '1'");
}
}
else {
$sExistProduct = $db->query("SELECT id,site_id,name_slug,feed_id,price_old,visible,price FROM furniture WHERE site_id='1' AND name_slug='".CreateSlug(trim($db->real_escape_string($Name)))."' AND feed_id='".$FeedID."' LIMIT 1");
if(!$sExistProduct) { }
else {
// Check if it is a salesproduct
$fExistProduct = $sExistProduct->fetch_assoc();
$OudePrijs = $fExistProduct['price_old'];
$Zichtbaar = $fExistProduct['visible'];
if($fExistProduct['visible'] == '2'){ $Visible = '1'; }
if($Price < $fExistProduct['price']){ $OldPrice = $fExistProduct['price']; }
$uProduct = $db->query("UPDATE furniture SET affiliate_url='".trim($URL)."', description='".$db->real_escape_string(trim($Description))."', price='".$db->real_escape_string(trim($Price))."', price_old='".$db->real_escape_string(trim($OldPrice))."', in_stock='".$Stock."', shipping_costs='".$db->real_escape_string(trim($Deliverycosts))."', check_today='1', last_update='".time()."', deliverytime='".$Deliverytime."', visible='".$Visible."' WHERE id='".$fExistProduct['id']."' LIMIT 1");
if($uProduct) {
// Updated
}
}
}
}
else
{
// Error in image (not every image send by the feed is a good one)
}
}
This is how I process one of my feeds to match the correct columns:
$SiteID = '1';
$FeedID = '1';
$StoreID = '1';
$Link = 'URL';
if (($handle = fopen($Link, "r")) !== FALSE) {
fgetcsv($handle);
$i = 0;
while (($data = fgetcsv($handle, 6000, ";")) !== FALSE) {
$num = count($data);
for ($c=0; $c < $num; $c++){ $col[$c] = $data[$c]; }
### VARIABLES FROM FEED ###
$Name = $col[1];
$URL = $col[5];
$Description = strip_tags($col[4]);
$EANSKU = $col[0];
$Image = $col[6];
$Brand = $col[8];
$Color = '';
$Price = $col[3];
$CategoryPath = $col[9];
$Stock = $col[10];
$Deliverycosts = $col[14];
$Deliverytime = '';
$Length = '';
$Width = '';
$Height = '';
$Depth = '';
$Material = '';
### STOCK ###
if($Stock > 0) {
$Stock = '1';
}
else {
$Stock = '0';
}
### SEND PRODUCT ###
CheckProduct($SiteID, $StoreID, $FeedID, $Name, $URL, $Description, $EANSKU, $Image, $Brand, $Color, $Price, $CategoryPath, $Stock, $Deliverycosts, $Deliverytime, $Length, $Width, $Depth, $Height, $Material);
$i++;
}
fclose($handle);
}
I'm using PHP 5.6.32 and MySQL 5.0.11.