0

I have a database with ~100000 rows and ~150 columns of data in the format:

data_id               data_name  monthly_data1  monthly_data2  monthly_data3    ""    ""
      0     product1_data-item1             20             30             10    ""    ""
      1     product1 data-item2             10             30             20    ""    ""
      2     product1 data-item3              9             10             23    ""    ""
      3     product2 data-item1             40             13             12    ""    ""
      4     product2 data-item2             31             12             32    ""    ""
      5     product2 data-item3             23             49             23    ""    ""

The data set above is a basic sample of the data, in reality, there are over 2000 products with 50+ data-items each (approx 100,000 rows) and approx 140 columns of data which is annual data.

I need to search the database for each data item - for each product (i.e. each row) and determine if the value in the columns month_data1 through month_data140 for each data-item is within a pre-determined minimum/ maximum range for that specific data item.

Here is the format of my code as is, it is working, just very slowly, approx 20 seconds to complete all 50 checks for each product for each year.

$numberProducts = 2000;
$numberLineItems = 50;
$numberMonths = 140;

for($m=0;$m<$numberMonths;$m++){
    for($p=0;$p$numberProducts;$p++){
      $dataMonth = 'data_month'.$m+1;
          $q="SELECT $dataMonth FROM product_table WHERE data_id='".($p*$numberLineItems)."'";
          $q=mysql_query($q);
          while($row=mysql_fetch_assoc($q)){
              $dataVal = $row[$dataMonth];
          }
          mysql_free_result($q);
          if(($dataVal>=$dataMin1)&&($dataVal<=$dataMax1)){
              $q="SELECT $dataMonth FROM product_table WHERE data_id='".($p*$numberLineItems+1)."'";
              $q=mysql_query($q);
              while($row=mysql_fetch_assoc($q)){
                  $dataVal = $row[$dataMonth];
              }
              mysql_free_result($q);
              if(($dataVal>=$dataMin2)&&($dataVal<=$dataMax2)){
                  $q="SELECT $dataMonth FROM product_table WHERE data_id='".($p*$numberLineItems+2)."'";
                  $q=mysql_query($q);
                  while($row=mysql_fetch_assoc($q)){
                      $dataVal = $row[$dataMonth];
                  }
                  mysql_free_result($q);
                  if(($dataVal>=$dataMin3)&&($dataVal<=$dataMax3)){
.
.
.

AND SO ON. All the way to the 50th data item for each product for each month, checking to see if the monthly value for that data item for that product is within a predetermined range --- The pre-determined range ( dataMin/dataMax) is different for each separate data item per product but the same per specific data item between products.

I am looking for a way to speed up the code, different query combination, server setting, loop style, etc. which may help to omptimize things and get down to just a few seconds required for output. ANY ideas would be appreciated.

My first thought was to change the select statement to select the entire database $q = "SELECT * FROM product_table"; and put the data into a multidimensional array to do the min/max checks and avoid the 14,000,000 queries but I get hit the "out of memory" limit.

There has got to be a better way...

IIIOXIII
  • 980
  • 3
  • 15
  • 30
  • 1
    Grab the results for the whole range in one go, analyze them afterwards? `SELECT data_id, $dataMonth FROM product_table WHERE data_id BETWEEN :min_id AND :max_id`. Oh, and [migrate away from `mysql_`](http://stackoverflow.com/questions/12859942/why-shouldnt-i-use-mysql-functions-in-php). – DCoder Jan 27 '13 at 11:38
  • 3
    It's great example what may happen if you put every month in separate column – Peter Jan 27 '13 at 11:39
  • Where is `$i` coming from? – Joachim Isaksson Jan 27 '13 at 11:39
  • @Joachim - oops, $i was a leftover, edited to $p – IIIOXIII Jan 27 '13 at 11:45
  • @DCoder - Maybe, the problem is that I need to know when/where data-item values do not meet fall within the range store them until they do again, do some calculations, etc. I.e. I need to be able to store/test each of the data items values separately. What if I changed the code so that I query items in 50 row chunks, into an array, and get the min/max results for each data-item for each product that way. It would reduce the number of queries by 50 times but not the number of if/for loops. Think it would help any? – IIIOXIII Jan 27 '13 at 11:54
  • 1
    Have you considered normalizing the schema so that each row holds data for a single month/data-item? Doing so would undoubtably simplify and optimize the speed of the code. If its an option, I'm sure we could help with suggestions on the new design. – kstevens715 Jan 27 '13 at 12:21
  • @Peter Szymkowski - Are you saying that it would be faster to query 14,000,000 rows of data rather than 100,000 rows by 140 columns? – IIIOXIII Jan 27 '13 at 18:21
  • 1
    @IIIOXIII Most likely, yes, since you can create an index on a single column quite easily, but 140 indexes (and I'm guessing growing) isn't easy to maintain or for the query optimizer to use. – Joachim Isaksson Jan 27 '13 at 18:22
  • @IIIOXIII No doubt about it. You would be able to put everything in one SQL query and get rows you are looking for straight away. This is why people are using SQL databases. At this moment you treat database like big excel file. – Peter Jan 27 '13 at 18:27
  • This seems to be a badly designed database. Looks to me that storing each month as a separate column is a bad decision. Maybe re-design the database and store each months result as a separate row in another database table? This will be a lot easier to analyze and have the database work for *you* not the other way round – thaJeztah Jan 27 '13 at 23:30

2 Answers2

0

You Can Try , As Like following ... I am sending without test, If you find out any syntax error, Please try with correction those syntax error ...

 $numberMonths = 140;
 $minRange = 20;
 $maxRange = 35;
 $dataItemArray = array();

 $q=mysql_query("SELECT * FROM product_table");
 while($row=mysql_fetch_assoc($q)){
         for($i = 1 ; $i <= numberMonths; $i++){
            $nowColumn = monthly_data$i;
            if(($row[$nowColumn] >= $minRange) AND ($row[$nowColumn] <= $maxRange))
                $dataItemArray = $row['data_name']
         }  

      }
  • Won't work. As I stated in my initial post, I already tried that, script runs out of memory with the selection of the 14,000,000 cells (100,000rows*140columns) – IIIOXIII Jan 27 '13 at 12:46
0

Here is one approach :

for ($i = 1; $i <= 50; $i++) {

    $$min = 'dataMin' . $i;
    $$max = 'dataMax' . $i;

    $dataMonth = 'data_month' . $i;

    //Query to get all the data_id's that fall in the given range
    //If you know that many of your data_month values will be in the min max range, 
    //you can use the opposite of this query.
    //i.e select those data_ids that are not in the range -- this result set 
    //will be significantly smaller and will consume much less time and memory
    /*
     * eg: 
     * $res = mysql_query('SELECT data_id 
                            FROM product_table 
                            WHERE ' . $dataMonth . ' > ' . $$max . ' OR ' . $dataMonth . ' < ' . $$min);
     */
    $res = mysql_query('SELECT data_id 
                            FROM product_table 
                            WHERE ' . $dataMonth . ' <= ' . $$max . ' AND ' . $dataMonth . ' >= ' . $$min);
    if (mysql_num_rows($res) > 0) {
        while ($row = mysql_fetch_assoc($res)) {
            //this arr will contain the data_ids who are in the given range for the moth $dataMonth
            $finalArr[$dataMonth][] = $row['data_id'];
        }
    }
    mysql_free_result($res);
    //$finalArr will have months as keys and data_ids who have the value in the specified range
}
Jaspal Singh
  • 1,210
  • 1
  • 12
  • 17