0

I am having little trouble about checking the DB in Laravel. I have scraped data and inserting those into DB. But before inserting progress. I want to check are there any same data or not (for example same url). If there is same url. Then avoid inserting that data.

what I have done so far is right below.

$i = 0;
$database = [];
foreach($placeUrls as $k => $urls) {
    $database = [
        "place_id" => $k,
        "website" => "a-site",
        "place_name" => $names[$k],
        "url" => $urls,
    ];

    if ($plan = Plan::where("url", "=", $urls)->first()) {
        if ($plan->url != $database["url"]) {
            $this->line("plan inserted");
            Plan::insertGetId($database);
        }
    }

    $i++;
}

But the checking part is not correct. How can I fix it?

tate
  • 49
  • 7

3 Answers3

1

Scrapers and crawlers are very resources-consuming applications, so I prefer to avoid that extra DB select interaction that checks for the same URL before I insert it.

In my simple crawler I added a column to the URLs table that holds the URL hash and added UNIQUE index on that column.

ALTER TABLE urls ADD COLUMN url_hash char(32) NOT NULL UNIQUE

you can hash the url with something fast like MD5 algorithm that takes the

$hash = md5($method . $domain . $url);

you can go with this option too for hashing

That will allow you to insert every URL you collect, without selecting it first from the database, and let the database deal with the uniqueness problem at a lower level.

WARNING: don't change the way you create the hash in the future or you will end with many duplicate urls

Accountant م
  • 6,975
  • 3
  • 41
  • 61
1

It seems that the $urls is an array. So, let's modify the code a bit.

$i = 0;
$database = [];

foreach($placeUrls as $k => $urls) {
    $database = [
        "place_id" => $k,
        "website" => "a-site",
        "place_name" => $names[$k],
        "url" => $urls,
    ];

    if ( ! $plan = Plan::whereIn("url", $urls)->first())
    { // ^              ^^^^^^^^^^^^^^^^^^^^^
        if ($plan->url != $database["url"])
        {
            $this->line("plan inserted");
            Plan::insertGetId($database);
        }
    }

    $i++;
}

The important thing is the first conditional, it says that if there is not a plan with a url included in the $url it will enter the conditional. Because, you want to avoid a duplicated entry.

Kenny Horna
  • 13,485
  • 4
  • 44
  • 71
0

you can use if empty

$plan = Plan::where("url", "=", $urls)->first()
  if (empty($plan->id)) {
     $this->line("plan inserted");
     Plan::insertGetId($database);
  }

or else you can use request validation

'url' => 'unique:plan'
Shankar S Bavan
  • 922
  • 1
  • 12
  • 32
  • actually, trying to check on urls. because next if I scrape data. I don't want to insert same url's data into DB. to prevent that, want to check url is already inside db or not. – tate Aug 22 '19 at 03:11