10

I'm looking for a very fast method to read a csv file. My data structure looks like this:

timestamp ,float     , string    ,ip          ,string
1318190061,1640851625, lore ipsum,84.169.42.48,appname

and I'm using fgetcsv to read this data into arrays.

The problem: Performance. On a regular basis the script has to read (and process) more than 10,000 entries.

My first attempt is very simple:

//Performance: 0,141 seconds / 13.5 MB

while(!feof($statisticsfile)) 
    {
    $temp = fgetcsv($statisticsfile);
    $timestamp[] = $temp[0];
    $value[] = $temp[1];
    $text[] = $temp[2];
    $ip[] = $temp[3];
    $app[] = $temp[4];
    }

My second attempt:

//Performance: 0,125 seconds / 10.8 MB

while (($userinfo = fgetcsv($statisticsfile)) !== FALSE) {
   list ($timestamp[], $value[], $text, $ip, $app) = $userinfo;
}
  1. Is there any way to improve performance even further, or is my method as fast as it could get?
  2. Probably more important: Is there any way to define what columns are read, e.g. sometimes only the timestamp, float columns are needed. Is there any better way than my way (have a look at my second attempt :)

Thanks :)

Nyoman
  • 197
  • 1
  • 9
  • 9
    10,000 lines of CSV is ***NOTHING***. The modifications you're doing are going shave (at most) a few milliseconds off a process that's only taking deciseconds anyways. Ask this kind of micro-optimization question when you reach millions or billions of rows. – Marc B Oct 09 '11 at 21:07
  • Please go through the many existing [questions on SO showing how to read CSV files with PHP](http://stackoverflow.com/search?q=read+csv+file+php). Take the examples given in answers and benchmark/profile them under your real world conditions. Then use what is fastest for your scenario. – Gordon Oct 09 '11 at 21:11
  • @Gordon Could you please point out a faster/better example from these questions. I've read them the past couple of days and up there is the result. – Nyoman Oct 09 '11 at 21:17
  • @Nyoman even if I would be willing to do your work for you, I couldnt because I cannot profile the various possible solutions (`fgetcsv`, `implode`, `SplFileObject`, `sscanf`) in your real world environment. You have to do it yourself. – Gordon Oct 09 '11 at 21:30
  • 2
    `$data = array_map("str_getcsv", file($filename));` is the speediest method, as it reads in the whole CSV file at once. – mario Oct 09 '11 at 21:47
  • if you don't need to put in memory all the parsed file (as you do in your example, adding each row to arrays $col[]=$field), using one row at once would improve performances... –  Oct 09 '11 at 21:50
  • @mario Unfortunately, str_getcsv is not be available... – Nyoman Oct 09 '11 at 22:07
  • @Nyoman: There are a couple of emulations in the manual, PHP_Compat, or upgradephp. Albeit you won't get the performance advantage without the builtin version. – mario Oct 09 '11 at 22:18
  • @wes Could you give me a hint? I'm not quiet getting your approach.. – Nyoman Oct 09 '11 at 22:29
  • @Nyoman in your example you're storing all the timezones in one array ( list ($timestamp[] ... ). the $timestamp[] array (and the other ones) will be a huge cause of slowdowns. but maybe you don't need to memorize the entire timestamp list in one array to do what you're trying to do –  Oct 10 '11 at 09:38
  • @Nyoman related to my previous example, i can suggest an alternative to complex results against a php implementation... you can use PDO drivers to read and execute complex queries on csv databases, or you can read a csv database as you do (with fgetcsv) and put it in one sqlite3 in-memory database (http://it.php.net/manual/en/sqlite3.construct.php) –  Oct 10 '11 at 09:45

2 Answers2

3

How long is the longest line? Pass that as the second parameter to fgetcsv() and you'll see the greatest improvement.

0

Check time that PHP read this file:

If is bigg move file to ramdisk or SSD

  1. [..]sometimes only the timestamp

Somthing like that

preg_match_all('#\d{10},\d{10}, (.*?),\d.\d.\d.\d,appname#',$f,$res);

print_r($res);
user956584
  • 5,316
  • 3
  • 40
  • 50