0

I have mysql table hravaj00_dily and there are columns part_id, img150, imgfull. In img150 and imgfull are stored urls to images. This table is updated from xml feed btw.

Is there any PHP solution to go through column img150 (or imgfull), check if url exists (404 error) and delete from database all these rows with non existing urls..?

I have read about this function below which checks http header of url. Is this somehow useful? I have no idea how exactly to use it.

function file_external_exists($url) 
{ 
    $headers = @get_headers($url); 
    if(preg_match("|200|",$headers[0])) 
    return(true); 
    else return(false); 
}
jaredk
  • 986
  • 5
  • 21
  • 37
6fix
  • 15
  • 1
  • 4
  • 2
    Possible Duplicate of http://stackoverflow.com/questions/408405/easy-way-to-test-a-url-for-404-in-php – Yashankit Vyas Apr 02 '14 at 13:14
  • curl is your bottleneck, you need parallel requests here... have a look to https://github.com/Bonnevoy/php-mcurl or some like that – vp_arth Apr 02 '14 at 15:35

2 Answers2

2
$con=mysqli_connect("example.com","peter","abc123","my_db");
$result = mysqli_query($con,"SELECT * FROM hravaj00_dily");

while($row = mysqli_fetch_array($result)) {
  $url = $row['img150'];
  if(!urlExists($url)) {
    $nonExistent[] = $row['id']; // Assuming you have primary key
  }
}

if($nonExistent) {
  $nonExistentCSV = implode(",", $nonExistent);
  $delQuery = "DELETE FROM hravaj00_dily WHERE id IN " . $nonExistentCSV;
  mysqli_query($con, $delQuery);
}


mysqli_close($con);

// Ref: http://stackoverflow.com/questions/408405/easy-way-to-test-a-url-for-404-in-php
function urlExists($url) {
  $handle = curl_init($url);
  curl_setopt($handle,  CURLOPT_RETURNTRANSFER, TRUE);

  $response = curl_exec($handle);

  $httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
  if($httpCode == 200) {
    curl_close($handle);
    return true;
  }
  curl_close($handle);
  return false;
}
  • I am reading all the rows and making curl request to check if it exists. once all the urls are checks i am updating it at once.
  • Its better to run low number of database queries, and its always best to not run query inside a loop. You may consider running queries in batch of 100 or 1000 inside a loop.
  • You might want to sleep for some time in between using sleep() function, otherwise if image server is overloaded it might block your request.
  • You might not want to check all at once, its better to get few rows like 100 or 1000 based on server capability.
  • You might have to check if runtime for this php is more that 30 secs (which is default value n php.ini
  • You might have to increase max memory allocated for executing of php script in php.ini
Rahul Prasad
  • 8,074
  • 8
  • 43
  • 49
0
  1. Get all records
  2. Iterate over them
  3. For each record call this function to check, if it exists
  4. If so, then delete record by that ID
Cysioland
  • 1,126
  • 1
  • 9
  • 21