2

Currently I am scraping one of my own websites using simple_html_dom and instead of making hundreds of calls to a database for the latest singles I decided to just store the url's which have already been published via a text file to prevent duplicate posts.

Here's my current loop.

$url = ''.$element->href;
$file = file_get_contents('album.txt');

if (strpos($file, ''.$url.'') !== false) {
    echo 'This Album Has Already Been Published';
} else {
    // do some thing in loop
    file_put_contents('album.txt', $data['url'] . PHP_EOL, $i % 33 ? FILE_APPEND : 0);  
}

Alright here's where the problem comes in, currently after about 1 day it stores around 400+ urls inside of this folder, which is problematic for me considering I only need it to keep the latest posts of around (50) urls stored inside of this text document.

How can I remove everything except the latest added 50 results from my text document?

John Doe
  • 143
  • 1
  • 10
  • 4
    *"I decided to just store the url's which have already been published via a text file to prevent duplicate posts."* - Why didn't you just add a UNIQUE constraint for it instead? *"instead of making hundreds of calls to a database"* - That may be the way you set it up. – Funk Forty Niner Dec 15 '17 at 18:58
  • 2
    @FunkFortyNiner I want to avoid storing these urls in my database. – John Doe Dec 15 '17 at 18:58
  • 3
    that's where relational tables come in handy. Files are a lot of work. – Funk Forty Niner Dec 15 '17 at 19:00
  • 2
    @FunkFortyNiner Your question has absolutely nothing to do with what he asked – Land Dec 15 '17 at 19:00
  • 1
    @Land that's just part of the problem. – Funk Forty Niner Dec 15 '17 at 19:01
  • Read the last 50 lines of the text document and rewrite it with only what you read out. Here's a question on reading from the end of a file in php: https://stackoverflow.com/questions/15025875/what-is-the-best-way-in-php-to-read-last-lines-from-a-file – Kallmanation Dec 15 '17 at 19:04
  • I think you can just take the code from [this](https://stackoverflow.com/a/47837529/2191572) answer and replace `125000` with `50`. – MonkeyZeus Dec 15 '17 at 19:04
  • See https://stackoverflow.com/questions/5712878/how-to-delete-a-line-from-the-file-with-php – kemika Dec 15 '17 at 19:05
  • 1
    So you've got a problem, tried to solve it by using a workaround (instead of fixing the problem) and now that workaround has the exact same problem? That's called an XY problem. Use an UNIQUE index and perform INSERT IGNORE. You can insert multiple rows in one query by the way, so there isn't even the need to making hundreds of calls. – Stephan Vierkant Dec 15 '17 at 19:08
  • 1
    Besides the duplicate that the question was closed with, you might like to look into [Delete first X lines of a database](https://stackoverflow.com/questions/12779978/delete-first-x-lines-of-a-database). I'd stay with the db, IMHO but that's up to you. Edit: and [Leave only first 50 records in SQL database and delete the rest](https://stackoverflow.com/questions/21255578/leave-only-first-50-records-in-sql-database-and-delete-the-rest) with a bit of modification. – Funk Forty Niner Dec 15 '17 at 19:14

0 Answers0