Currently I am scraping one of my own websites using simple_html_dom and instead of making hundreds of calls to a database for the latest singles I decided to just store the url's which have already been published via a text file to prevent duplicate posts.
Here's my current loop.
$url = ''.$element->href;
$file = file_get_contents('album.txt');
if (strpos($file, ''.$url.'') !== false) {
echo 'This Album Has Already Been Published';
} else {
// do some thing in loop
file_put_contents('album.txt', $data['url'] . PHP_EOL, $i % 33 ? FILE_APPEND : 0);
}
Alright here's where the problem comes in, currently after about 1 day it stores around 400+ urls inside of this folder, which is problematic for me considering I only need it to keep the latest posts of around (50) urls stored inside of this text document.
How can I remove everything except the latest added 50 results from my text document?