1

My problem is the following: I have two arrays $first and $second of the same length, containing strings. Every string is given a positive value in a table named Fullhandvalues:

Field: board : string(7) PRIMARY KEY
Field: value : int (11)

I want to count how many times $first[$i] has a better value than $second[$i], how many times they have the same value, and how many times $first[$i] has a worse value than $second[$i].

What I have done now is getting all the values via

$values[0]= DB::table('Fullhandvalues')->where_in("board",$first)->get(Array("value"));
$values[1]= DB::table('Fullhandvalues')->where_in("board",$second)->get(Array("value"));

and then comparing the values. But this seems to be very slow (approximately 6 seconds, for an array length of 5000 and 50000 entries in the table)

Thanks very much in advance

EDIT: How I loop through them:

$win=0;$lose=0;$tie=0;
for($i=0;$i<count($values[0]);$i++)
    {
        if ($values[0][$i]>$values[1][$i])
            $win++;
        elseif ($values[0][$i]<$values[1][$i])
            $lose++;
        else $tie++;
    }
edi9999
  • 19,701
  • 13
  • 88
  • 127
  • Are your columns indexed? – David Feb 10 '13 at 23:03
  • Yep, there is an index on board – edi9999 Feb 10 '13 at 23:04
  • Sorry I see the index now. Can you post how you're looping through them? – David Feb 10 '13 at 23:05
  • Thanks - how long does the page take without the loop just to check? Although I have a strong feeling it is the query - as string comparisons are extremely slow, and you're essentially seeing if each row "board" is in the provided where_in clause, which has 5000 of them. It may be more ideal to come up with a system that allows integer based comparison (e.g. convert the base to 10 and store that). – David Feb 10 '13 at 23:12
  • The loop takes 0.06s, the request 5.4s, the rest of the code 0.2s. That's why I think I should change my request first. – edi9999 Feb 10 '13 at 23:15
  • 1
    As mentioned then - it is down to the comparison of strings essentially. String comparison (or searching) is EXTREMELY slow in comparison to indexed integers. The best solution I can think of right now is to store it in an integer format, like converting it from it's current base into base 10, then converting it back when you need to. – David Feb 10 '13 at 23:17
  • 1
    One thing you could try is retrieving all 50000 rows as an associative array indexed on board. Then, you can compare your two arrays by using them as keys into this associative array. It might be less of a load on the server if it doesn't have to pick out thousands of specific rows and return only those. Of course, a change as big as this will need profiling to see if it's faster or not :) – Patashu Feb 10 '13 at 23:20
  • I have tried by converting the boards to ints (going from 1 to 1123046875), but it became even slower (request= 8.2s). I am now trying to hardcode it. – edi9999 Feb 11 '13 at 12:25
  • Thank you very much @Patashu, I tried by putting an associative array into my code, and now the total execution time is 0.25s . I'm working on it to become even faster, keeping this solution. – edi9999 Feb 11 '13 at 19:09

1 Answers1

1

Your problem is where_in. You are basically building a query with the length of implode(',', $second) (plus change). This has to be first generated by Laravel (PHP) and then analysed by your DBMS.

Also the generated query will use the IN(...) expression, which is known to be slow in MySQL.

Without further information about the application and how board IDs are selected, here is an option you have:

  • Create a temp-table and fill it with your array data (this should be quite fast, but preferably this data should already be in the database)
  • Don't forget to create an index on the temp table.
  • Select with an inner join.
dualed
  • 10,262
  • 1
  • 26
  • 29
  • If I want to INSERT the data `$first` and `$second`, I will have to implode them too, and the DB will have to analyse everything too? I don't see the difference. – edi9999 Feb 11 '13 at 12:26
  • I'd like to see your evidence of IN(...) being slow in MySQL, it out performs any alternative SQL syntax for this type of thing (e.g. multiple ORs), as seen: http://stackoverflow.com/questions/782915/mysql-or-vs-in-performance and – David Feb 12 '13 at 00:34
  • @David It ***is*** slow, if you need evidence just ask Google or look at the performance increase of the QA who downed the query duration by 95% even though he selected 10 times as many rows. And I offered a faster solution which is joins and which is in my answer. – dualed Feb 12 '13 at 10:53
  • @edi9999 You are absolutely right. However, I am too. Preferably you create the temp table not as often as you do the selects. And in some cases even if you do, creating a temp table offers a performance boost. Like I said without knowing more, this is what I can offer you as a solution. – dualed Feb 12 '13 at 11:02
  • @dualed I have googled it and can still see no evidence that suggests IN is slower than the alternatives to the syntax. IN does have a major performance lose when having a sub query within it. Grabbing all rows is quicker than having to find a section or group of them, he just moved the load onto PHP. Everyone knows MySQL servers are resource hogs, so when anything can be made so it doesn't rely heavily on the queries it is more ideal. If we really want to get into it, we could also start saying he should use X database engine as it performs better for reading from tables. – David Feb 12 '13 at 14:40
  • @dualed but at the end of the day, his problem is now solved, so no more needs to be said. (Comment was too long to post it all in one!). – David Feb 12 '13 at 14:40
  • @David this is not completely true. While currently with 50000 to 5000 the QA may have acceptable PHP performance, this can quickly change with a growing database. And while MySQL can run the query reasonably fast and scalable on B-Tree indexes (and in this case even faster on Hash-Indexes), PHP does not. Here is a link for you to give you your "proof" http://explainextended.com/2009/08/18/passing-parameters-in-mysql-in-list-vs-temporary-table/ took me 20s of google... Now where would your proof be that "***everyone** knows MySQL servers are resource hogs*"? – dualed Feb 12 '13 at 16:08
  • @dualed Saw that on google but was skim reading, and hence read the last part which basically said they were the same, my mistake. The MySQL servers I have seen in action for companies hosting several hundred sites show their resource usage at 80%~ throughout the day (This server being dedicated for MySQL, which is more beneficial than using one on the same server). However - I was obviously wrong with my statement as you did not seem to know this (making the everyone part the failing issue). I didn't say they were more resource intensive than other DB Systems but they are quite resource heavy – David Feb 12 '13 at 16:21