2

I have a CSV database of 1M rows that I'm ingesting into Laravel using the following Seeder:

public function run()
{
DB::table("terms")->truncate();

DB::beginTransaction();

$csvFile = fopen(base_path("database/terms.csv"), "r");

while (($data = fgetcsv($csvFile, 100, ",")) !== FALSE) {
  DB::table("terms")->insert([
      "serial" => $data['0'],
      "output" => $data['1']
  ]);
}

fclose($csvFile);

DB::commit();
}

And here's the migration that the seeder is being run from:

public function up()
{  
    Schema::create('items', function (Blueprint $table) {
        $table->mediumIncrements("id");
        $table->string("serial")->unique();
        $table->string("output");
    });

    $seeder = new ItemsSeeder();
    $seeder->run();
}

I can't use chunking because it doesn't make sense for me to create an Eloquent model for the data, so I'm using database transactions to improve performance (which takes ingestion time down from an hour to half an hour). This works fine as a solution except for the fact that it adds an extra few hundred rows to the table. Even weirder, when I reverse the sorting order to check the last row, it has an id of 1000000 (1 million) as expected, indicating no extra rows have been added.

What's going on here?

Hashim Aziz
  • 4,074
  • 5
  • 38
  • 68
  • 1
    Are the extra rows duplicates of other rows? – Barmar Feb 24 '23 at 21:17
  • @Barmar Not according to a SQL [command](https://stackoverflow.com/a/69193088/1191147) I ran on all the columns. – Hashim Aziz Feb 24 '23 at 22:53
  • @nnichols It does, yes. I was going by PHPMyAdmin's row count, but I'm only now noticing a hint on that column that when clicked says: "May be approximate. Click the numbers to get the exact count." Sure enough clicking on the number also reveals it to be 1,000,000. It's confusing that PHPMyAdmin doesn't just count the rows, but either way I know for next time. Feel free to post this as an answer and I'll accept. – Hashim Aziz Feb 24 '23 at 22:56
  • 2
    Because counting the rows can be slow in a large table. – Barmar Feb 24 '23 at 22:57

1 Answers1

1

What makes you think you have extra rows?

If you are going by the row count shown in a "tool", it is probably displaying the approximated row count returned by SHOW TABLE STATUS.

Does SELECT COUNT(*) FROM terms return the 1M you are expecting?

user1191247
  • 10,808
  • 2
  • 22
  • 32