4

I have 10 files containing anywhere from 50 to 250 rows each.

I need to be able to pull 1 or more random rows from different files.

I currently do this...

$lines = file($filePath);
if ($number == "1") {
    return $lines[array_rand($lines)];
}
else {
    shuffle($lines);
    return array_slice($lines, 0, $number);
}

However, I just read of a way to do the same thing using MySQL here:

https://stackoverflow.com/a/4329447/390480

Would I really have that much of a performance gain moving these 10 files to a MySQL database and doing that query instead?

Thanks!

Community
  • 1
  • 1
user390480
  • 1,655
  • 4
  • 28
  • 61

3 Answers3

2

From my experience, when you don't have too much information, it's quicker to retrieve info from a file than from a database. In my case I have files where I store arrays with types of houses for example, and it was faster to get it from files. The files were about 10kb only, and the performance was 10 times faster, maybe I'm wrong with the performance improvement but definetively it was fast enough to make me delete those tables containing those rows and use the text files xD.

The best way to ensure the performance is to make use of the function microtime() and do it both ways. Then you can see the performance through your own benchmark.

I use this a lot:

$start = microtime(true);

// my code;
// If it's not a very long script, you should always put it inside a loop, let's say thousands times or more, depending on the script, because sometimes just the resources used by the system could vary the benchmark, as suggested by @Colin Morelli .

echo microtime(true) - $start;

You get the output of the microseconds elapsed.

aleation
  • 4,796
  • 1
  • 21
  • 35
  • 1
    Just to comment: when benchmarking you should almost always run a loop multiple times. There are too many variables that can happen during one check. For example, what if something on your computer just slams your CPU during that portion of the benchmark? It's generally best to run the loop at least a few hundred times (significantly more for simple tasks) and then take the average execution across all runs – Colin M Mar 19 '13 at 12:45
  • Yes I know, actually it's on the comment that I usually use loops. The project I'm working on uses a lot of resources and takes up to 5 mins to run the script, that's why on the long ones I don't use the loops :P – aleation Mar 19 '13 at 12:48
  • I know, but you said that "sometimes if it's a very simple task." I just commented to say that you should really _always_ loop your benchmarks. I have yet to see a benchmark-able scenario that takes anything more than a second. Remember, when you benchmark, you're not benchmarking the _entire_ script (talk about variables). You're benchmarking one specific piece of it, to see how it performs. – Colin M Mar 19 '13 at 12:50
0

Making it inside database should be faster than reading whole file and shuffling it. But you have to insert this data to database which will generate additional workload. You should check it manually, which option is faster.

BUT remember about one more thing. When you will have large files, then memory usage can be huge, so you should think about it too.

Kasyx
  • 3,170
  • 21
  • 32
  • It's not true that it "should be faster". Databases read from disk too and there may be a lot more overhead to get the data. Also ORDER BY RAND() which is what is required here is slow. It **may** be faster, it may not. – Popnoodles Mar 19 '13 at 12:50
  • 1
    Yes, that's true, but database is very good optimized. There is also `RAND()`, which generates temp table, but `RAND` on table is faster than `shuffle` on php array. – Kasyx Mar 19 '13 at 14:58
  • No, in the real world RAND is not faster than shuffle. PHP can shuffle and slice 30,000 ids in 0.003 seconds. Show me a realistically priced database server that can outperform that with RAND() and LIMIT. Read the comments on this answer. http://stackoverflow.com/questions/15435560/get-10-rows-from-a-query/15435668?noredirect=1#comment21906321 – Popnoodles Mar 20 '13 at 11:46
-1

10*250 = 2500 rows - it's nothing for MySQL. You can do anything you want with this data, all will work fast. And database will work faster than files, definitely. Also, you can store it in Redis and use SRANDMEMBER - just alternative variant.

OZ_
  • 12,492
  • 7
  • 50
  • 68