2

I have been looking for how to find a value in one line and return the value of another column in a CSV file.

This is my function and it works fine but in small files:

function find_user($filename, $id) {
    $f = fopen($filename, "r");
    $result = false;
    while ($row = fgetcsv($f, 0, ";")) {
        if ($row[6] == $id) {
            $result = $row[5];
            break;
        }
    }
    fclose($f);
    return $result;
}

The problem is that the actual file with which I must work has a size of 4GB. And the time it takes to search is tremendous.

Navigating through Stack Overflow, I found the following post: file_get_contents => PHP Fatal error: Allowed memory exhausted

There they give me the following function that (from what I understood) makes it easier for me to search for huge CSV values:

function file_get_contents_chunked($file,$chunk_size,$callback)
{
    try
    {
        $handle = fopen($file, "r");
        $i = 0;
        while (!feof($handle))
        {
            call_user_func_array($callback,array(fread($handle,$chunk_size),&$handle,$i));
            $i++;
        }

        fclose($handle);

    }
    catch(Exception $e)
    {
         trigger_error("file_get_contents_chunked::" . $e->getMessage(),E_USER_NOTICE);
         return false;
    }

    return true;
}

And the way of using it seems to be the following:

$success = file_get_contents_chunked("my/large/file",4096,function($chunk,&$handle,$iteration){
    /*
        * Do what you will with the {&chunk} here
        * {$handle} is passed in case you want to seek
        ** to different parts of the file
        * {$iteration} is the section fo the file that has been read so
        * ($i * 4096) is your current offset within the file.
    */

});

if(!$success)
{
    //It Failed
}

The problem is that I do not know how to adapt my initial code to work with the raised function to speed up the search in large CSVs. My knowledge in PHP is not very advanced.

halfer
  • 19,824
  • 17
  • 99
  • 186
Kokox
  • 519
  • 1
  • 9
  • 24
  • Reading the file in 4096 byte chunks probably won't speed things up much since you have to read through the entire file every time you want to search. You're better off importing the file into a database once, and using the DB's ability to quickly search itself. – kmoser Aug 28 '17 at 06:30
  • Try adding a length in argument 2 where the value is the length of the line and see if that makes a diff, for example ```$row = fgetcsv($f, 1024, ";")``` – crafter Aug 29 '17 at 09:55

1 Answers1

3

No matter how you read the file, there's no way to make search faster since you always have to scan every character while searching for the correct row and column. Worst case is when the row you're looking for is the last one in a file.

You should import your CSV to a proper indexed database and modify your application to further save new records to that database instead of a CSV file.

Here's a rudimentary example using SQLite. I created a CSV file with 100 million records (~5GB) and tested with it.

Create a SQLite database and import your CSV file into it:

$f = fopen('db.csv', 'r');
$db = new SQLite3('data.db');
$db->exec('CREATE TABLE "user" ("id" INT PRIMARY KEY, "name" TEXT,
    "c1" TEXT, "c2" TEXT, "c3" TEXT, "c4" TEXT, "c5" TEXT)');
$stmt = $db->prepare('INSERT INTO "user"
    ("id", "name", "c1", "c2", "c3", "c4", "c5") VALUES (?, ?, ?, ?, ?, ?, ?)');
$stmt->bindParam(1, $id, SQLITE3_INTEGER);
$stmt->bindParam(2, $name, SQLITE3_TEXT);
$stmt->bindParam(3, $c1, SQLITE3_TEXT);
$stmt->bindParam(4, $c2, SQLITE3_TEXT);
$stmt->bindParam(5, $c3, SQLITE3_TEXT);
$stmt->bindParam(6, $c4, SQLITE3_TEXT);
$stmt->bindParam(7, $c5, SQLITE3_TEXT);
$db->exec('BEGIN TRANSACTION');
while ($row = fgetcsv($f, 0, ';')) {
    list($c1, $c2, $c3, $c4, $c5, $name, $id) = $row;
    $stmt->execute();
}
$db->exec('COMMIT');

This takes a long time, over 15 minutes on my computer, resulting a 6.5GB file.

Search from a database:

$id = 99999999;
$db = new SQLite3('data.db');
$stmt = $db->prepare('SELECT "name" FROM "user" WHERE "id" = ?');
$stmt->bindValue(1, $id, SQLITE3_INTEGER);
$result = $stmt->execute();
print_r($result->fetchArray());

This executes virtually instantenously.

Joe
  • 1,656
  • 11
  • 10