2

The target is to check a product description and to identify different characteristics/product options. The input data has the following structure:

// TABLE WITH INPUT DATA. STRUCTURE: PRODUCT_CATEGORY [0], PRODUCT_NUMBER[1], DESCRIPTION OF AN OPTION [2]. THE INPUT DATA TABLE CAN CONSIST OF UP TO 400-500 ROWS

$input_product_data = array (
array('AAAA','1111','Chimney with red bricks in the center of the room'),
array('BBBB','2222','Two wide windows in the main floor'),
array('BBBB','2233','Plastic window has to be changed later'),
array('CCCC','3333','Roof tiles renewed in 2015'),
array('NULL','4444','Floor has been renovated for two years. Currently it has ground in wood.'),
array('NULL','NULL','Beautiful door in green color built at begin of 20th century')

);

There are 3 different constelations to indicate a product option:

  1. Only by search string within product description,

Example:
Input data: array('NULL','NULL','Beautiful door in green color built at begin of 20th century')
Search string: 'green color' within PRODUCT_DESCRIPTION
Result: Available

  1. By search string within product description + product category:

Example:
Input data: array('CCCC','NULL','Roof tiles renewed in 2015'),
Search strings: 'CCCC' within PRODUCT_CATEGORY + 'green color' within PRODUCT_DESCRIPTION
Result: Available

  1. By search string within product description + product category + product number.

Example:
Input data: array('AAAA','1111','Chimney with red bricks in the center of the room')
Search strings: 'AAAA' within PRODUCT_CATEGORY + '1111' within PRODUCT_NUMBER + 'Chimney' within PRODUCT_DESCRIPTION
Result: Available

IMPORTANT:

  • The table with input data per product can consist of up to 450 description rows.
  • The search strings can be defined many times ( e. g. 10 different search strings for the option "Windows" like "windows in the floor", "big windows", "window without glas" etc.).
  • The start set of rules (combinations of product description + product category + product number) will consist of ca. 3000 rows and will be extented permanently by business guys.

REALIZATION VARIANT A (by use of preg_match):

// TABLE FOR PRODUCT OPTIONS. STRUCTURE: ID[0], OPTION NAME[1], OPTION CATEGORY[2], OPTION-FAMILY[3], PROD.-NR[4], REG. EXPRESSION[5], PRIORITY[6], OUTPUT[7]

$ct_product_options = array (
  array('0001', 'Chimney', 'Additional options', '/^AAAA/', '/9999/', '/^Chimney with./', '0', 'Available'),
  array('0002', 'Material of ground floor', 'Additional options', '/NULL/', '/^4444$/', '/.wood./', '0', 'Wood'), 
  array('0003', 'Roof tiles', 'Basic options', '/^CCCC/', '/0022/', '/^Roof tiles./', '0', 'Available'),
  array('0004', 'Windows', 'Basic options', '/^B...$/', '/^2.../', '/.window$/', '0', 'Available'),
  array('0004', 'Windows', 'Basic options', '/^B...$/', '/^2.../', '/.wide windows./', '0', 'Available'), 
  array('0005', 'Door color', 'Basic options', '/NULL/', '/NULL/', '/green/', '0', 'Green'), 
  array('0006', 'Air condition', 'Additional options', '/NULL/', '/NULL/', '/^Air condition made in Japan/', '0', 'Green')
);

// FOR LOOP TO MAKE COMPARISON BETWEEN INPUT PRODUCT DATA AND PREDEFINED CUST. STRINGS

$matches_array = array();

foreach ($input_product_data as [$product_family, $product_number, $product_description]) {
    foreach($ct_product_options as [$option_id, $option_name, $option_category, $product_family_reg_exp, $product_number_reg_exp, $regular_expression, $priority, $output]) {
        
   if (preg_match($regular_expression, $product_description) == 1
   &&  preg_match($product_family_reg_exp, $product_family) == 1 ||
   
       preg_match($regular_expression, $product_description) == 1
   &&  preg_match($product_number_reg_exp, $product_number) == 1) {
    
    $matches_array [] = array("id" => $option_id, "option_name" => $option_name, "option_category" => $option_category, "output"=> $output);
    
    
    } 
    
    else {

   if (empty($product_family) && empty($product_number)) {

   if (preg_match($regular_expression, $product_description) == 1) {
    
    $matches_array [] = array("id" => $option_id, "option_name" => $option_name, "option_category" => $option_category, "output"=> $output);
    
   }
   }
    }   
  }
}

//echo "<pre>";
//print_r($matches_array);

// FUNCTION FOR DELETE DUBLICATES FROM ARRAY WITH MATCHES

function unique_multidimensional_array($array, $key) {
$temp_array = array();
$i = 0;
$key_array = array();

foreach($array as $val) {
    if (!in_array($val[$key], $key_array)) {
        $key_array[$i] = $val[$key];
        $temp_array[$i] = $val;
    }
    $i++;
}
return $temp_array;
}

//echo "<br><h3>UNIQUE MATCHES</h3>";

// CALL OF THE FUNCTION TO GET UNIQUE MATCHES

$unique_matches = unique_multidimensional_array($matches_array, 'id');
sort($unique_matches);
//echo "<pre>";
//print_r($unique_matches);

// CALL OF THE FUNCTION TO CREATE LIST/ARRAY WITH ALL AVAILABLE PRODUCT OPTIONS

$list_all_product_options = unique_multidimensional_array($ct_product_options, 0);
$list_all_product_options_short = array();

foreach ($list_all_product_options as $option_item) {
    $list_all_product_options_short[] =  array("id" => $option_item[0], "option_name" => $option_item[1], "option_category" => $option_item[2]);
}

sort($list_all_product_options_short);

//echo "<h3>LIST WITH ALL PRODUCT OPTIONS (SHORT VERSION)</h3>\n";
//echo "<pre>";
//print_r($list_all_product_options_short);



$unique_matches = array_column($unique_matches, null, 'id');

foreach ($list_all_product_options_short as $key => $value) {
    if (isset($unique_matches[$value['id']])) {
        $result[$key] = array_merge($value, $unique_matches[$value['id']]);
    } else {
        $result[$key] = array_merge($value, ['output' => 'Not available']);
    }
}

echo "<h3>FINAL RESULTS</h3>\n";

//echo "<pre><br>\n";
print_r($result);

The variant realized with preg_match works well and provide quite good flexibilty by defining of the regex. E. g. Instead to define the whole product number "2222" I can use only "/^2.../". Or I can combine many regex within one row by use of "|" (e. g. ".wide windows. | some window | etc."). The problem is that by real data volume 500 rows within $input_product_data and 3000 rows within $ct_product_options the code is quite slow.

REALIZATION VARIANT B (by use of stripos):

// INPUT DATA WITH PRODUCT DESCRIPTION. STRUCTURE: PROD. FAMILY, PROD. NUMBER, PRODUCT DESCRIPTION

$input_product_data = array (
array('AAAA','1111','Chimney with red bricks in the center of the room'),
array('BBBB','2222','Two wide windows in the main floor'),
array('BBBB','2233','Plastic window has to be changed later'),
array('CCCC','3333','Roof tiles renewed in 2015'),
array('NULL','4444','Floor has been renovated for two years. Currently it has ground in wood.'),
array('NULL','NULL','Beautiful door in green color built at begin of 20th century')

);

// CUSTOMIZING TABLE FOR PRODUCT OPTIONS. STRUCTURE: ID[0], OPTION NAME[1], OPTION CATEGORY[2], OPTION-FAMILY[3], PROD.-NR[4], REG. EXPRESSION[5], PRIORITY[6], OUTPUT[7]

$ct_product_options = array (
  array('0001', 'Chimney', 'Additional options', 'AAAA', '9999', 'Chimney with', '0', 'Available'),
  array('0002', 'Material of ground floor', 'Additional options', 'NULL', '4444', 'wood', '0', 'Wood'), 
  array('0003', 'Roof tiles', 'Basic options', 'CCCC', '0022', 'Roof tiles', '0', 'Available'),
  array('0004', 'Windows', 'Basic options', 'BBBB', '2222', 'window', '0', 'Available'),
  array('0004', 'Windows', 'Basic options', 'BBBB', '2222', 'wide windows', '0', 'Available'), 
  array('0005', 'Door color', 'Basic options', 'NULL', 'NULL', 'green', '0', 'Green'), 
  array('0006', 'Air condition', 'Additional options', 'NULL', 'NULL', 'Air condition made in Japan', '0', 'Green') 
);

// IMPORTANT: THE REG. EXPRESSIONS CAN BE DEFINED MANY TIME (e. g. 10 DIFFERENT REG: EXPRESSIONS FOR WINDOW). POINTS "." REPRESENTS EMPTY SPACES WHICH ARE IMPORTANT TO INDETIFY EXACTLY AN OPTION. 


// FOR LOOP TO MAKE COMPARISON BETWEEN INPUT PRODUCT DATA AND PREDEFINED CUST. STRINGS

$matches_array = array();

foreach ($input_product_data as [$product_family, $product_number, $product_description]) {
    foreach($ct_product_options as [$option_id, $option_name, $option_category, $product_family_reg_exp, $product_number_reg_exp, $regular_expression, $priority, $output]) {
        
   if (stripos($product_description, $regular_expression) !== false
   &&  stripos($product_family, $product_family_reg_exp)  !== false ||
   
       stripos($product_description, $regular_expression) !== false
   &&  stripos($product_number, $product_number_reg_exp) !== false) {
    
    $matches_array [] = array("id" => $option_id, "option_name" => $option_name, "option_category" => $option_category, "output"=> $output);
    
    
    } 
    
    else {

   if (empty($product_family) && empty($product_number)) {

   if (stripos($product_description, $regular_expression) !== false) {
    
    $matches_array [] = array("id" => $option_id, "option_name" => $option_name, "option_category" => $option_category, "output"=> $output);
    
   }
   }
    }   
  }
}

//echo "<pre>";
//print_r($matches_array);

// FUNCTION FOR DELETE DUBLICATES FROM ARRAY WITH MATCHES

function unique_multidimensional_array($array, $key) {
$temp_array = array();
$i = 0;
$key_array = array();

foreach($array as $val) {
    if (!in_array($val[$key], $key_array)) {
        $key_array[$i] = $val[$key];
        $temp_array[$i] = $val;
    }
    $i++;
}
return $temp_array;
}

//echo "<br><h3>UNIQUE MATCHES</h3>";

// CALL OF THE FUNCTION TO GET UNIQUE MATCHES

$unique_matches = unique_multidimensional_array($matches_array, 'id');
sort($unique_matches);
//echo "<pre>";
//print_r($unique_matches);

// CALL OF THE FUNCTION TO CREATE LIST/ARRAY WITH ALL AVAILABLE PRODUCT OPTIONS

$list_all_product_options = unique_multidimensional_array($ct_product_options, 0);
$list_all_product_options_short = array();

foreach ($list_all_product_options as $option_item) {
    $list_all_product_options_short[] =  array("id" => $option_item[0], "option_name" => $option_item[1], "option_category" => $option_item[2]);
}

sort($list_all_product_options_short);

//echo "<h3>LIST WITH ALL PRODUCT OPTIONS (SHORT VERSION)</h3>\n";
//echo "<pre>";
//print_r($list_all_product_options_short);

// ::::::::::::::::::::::::::::::::::

$unique_matches = array_column($unique_matches, null, 'id');

foreach ($list_all_product_options_short as $key => $value) {
    if (isset($unique_matches[$value['id']])) {
        $result[$key] = array_merge($value, $unique_matches[$value['id']]);
    } else {
        $result[$key] = array_merge($value, ['output' => 'Not available']);
    }
}



echo "<h3>FINAL RESULTS</h3>\n";

//echo "<pre><br>\n";
print_r($result);

It works much faster, but does not provide the felixibility of regex.

So, my questions:

  • Do you see any ways to optimize VARIANT A to get it faster or optimize VARIANT B to get it more flexible?

  • Especial question: How I can add the logic for the parameter PRIORITY from the table $ct_product_options?

The business logic is the following for it: As default all rows/rules have priority "0". But some of them will get priority ">0" (e. g. "1" or "2" etc.). The rule with highest priority should overwrite other rules.

E. g.

This rule with priority "0" identified windows in the house.

  array('0004', 'Windows', 'Basic options', '/^B...$/', '/^2.../', '/.wide windows./', '0', 'Available')

At the same time this rule with priority "1" tells us that all windows are not available more. So, that means we have to get "Not available" within the final results.

  array('0004', 'Windows', 'Basic options', '/^B...$/', '/^2.../', '/^Windows have been removed from the whole building last year/', '1', 'Not available')
  • This is a lot to read; it will take some time to digest it all. I don't think you mean `'green color'` in scenario #2 – mickmackusa Aug 20 '22 at 03:41
  • Spacing is not enough to separate logic. Even when it is, I recommend parenthetical grouping @ `preg_match($regular_expression, $product_description) == 1 && preg_match($product_family_reg_exp, $product_family) == 1 || preg_match($regular_expression, $product_description) == 1 && preg_match($product_number_reg_exp, $product_number) == 1` Using parenthetical grouping helps to avoid bugs and developer misunderstandings. – mickmackusa Aug 20 '22 at 03:45

1 Answers1

1

Before optimizing the variants, I believe I should tell how I would implement a solution to solve generate the intended array.

I ran your code to understand better what should be the result. But instead of using print_r, I did this:

echo json_encode($result,  JSON_PRETTY_PRINT);

I got this:

[
    {
        "id": "0001",
        "option_name": "Chimney",
        "option_category": "Additional options",
        "output": "Available"
    },
    {
        "id": "0002",
        "option_name": "Material of ground floor",
        "option_category": "Additional options",
        "output": "Wood"
    },
    {
        "id": "0003",
        "option_name": "Roof tiles",
        "option_category": "Basic options",
        "output": "Available"
    },
    {
        "id": "0004",
        "option_name": "Windows",
        "option_category": "Basic options",
        "output": "Available"
    },
    {
        "id": "0005",
        "option_name": "Door color",
        "option_category": "Basic options",
        "output": "Green"
    },
    {
        "id": "0006",
        "option_name": "Air condition",
        "option_category": "Additional options",
        "output": "Not available"
    }
]

I noticed each array element is an element from $ct_product_options mapped to some format. So, I used array_map like this:

$result = array_map(
    fn($option) => [
        'id' => $option[0],
        'option_name' => $option[1],
        'option_category' => $option[2],
        'output' => get_option_output($option, $input_product_data),
    ],
    $ct_product_options
);

Now I have to implement get_option_output. I think all those nested foreach and if in both A and B variants make the code hard to understand (besides how each line is indented). If I understand correctly your intentions, it seems this has a bug:

if (
  preg_match($regular_expression, $product_description) == 1
  && preg_match($product_family_reg_exp, $product_family) == 1 ||
  preg_match($regular_expression, $product_description) == 1
  && preg_match($product_number_reg_exp, $product_number) == 1) {

And you wanted to do something like this:

$productDescriptionMatches = preg_match($regular_expression, $product_description);
if (
  (
    $productDescriptionMatches
    && preg_match($product_family_reg_exp, $product_family)
  ) || (
    $productDescriptionMatches
    && preg_match($product_number_reg_exp, $product_number)
  )
) {

Which is equivalent to:

if (
  preg_match($regular_expression, $product_description)
  && (
    preg_match($product_family_reg_exp, $product_family)
    || preg_match($product_number_reg_exp, $product_number)
  )
) {

If I counted everything correctly, and assuming you made that mistake, I believe you want something like this:

function some($array, $callback)
{
    foreach ($array as $item) {
        if ($callback($item)) {
            return $item;
        }
    }
    return false;
}

function get_option_output($option, $products)
{
    $found = some(
        $products,
        fn($product) =>
            (
                preg_match($option[5], $product[2])
                && (
                    preg_match($option[3], $product[0])
                    || preg_match($option[4], $product[1])
                    || (
                      empty($product[0])
                      && empty($product[1])
                    )
                )
            )
    );
    return $found ? $option[7] : 'Not available';
}

$result = array_map(
    fn($option) => [
        'id' => $option[0],
        'option_name' => $option[1],
        'option_category' => $option[2],
        'output' => get_option_output($option, $input_product_data),
    ],
    $ct_product_options
);

In average, the execution time of that code was: 0.0000189903259277 seconds. I ran 10,000 iterations.

Variant A took in average: 0.0000316595554352 seconds. Variant B took in average: 0.0000314178943634 seconds.

The code I provided doesn't have nested loops and doesn't have to remove repeated elements and sorting them twice. But it's possible to make it run faster:

$result = [];
foreach ($ct_product_options as $option) {
    foreach ($input_product_data as $product) {
        $output = null;
        $isAvailable =
            (
                preg_match($option[5], $product[2])
                && (
                    preg_match($option[3], $product[0])
                    || preg_match($option[4], $product[1])
                    || (
                      empty($product[0])
                      && empty($product[1])
                    )
                )
            );
        if ($isAvailable) {
            $output = $option[7];
            break;
        }
    }
    $result []= [
        'id' => $option[0],
        'option_name' => $option[1],
        'option_category' => $option[2],
        'output' => $output ?? 'Not available',
    ];
}

It took, in average, 0.0000132960796356 seconds. But it's harder to understand.

That answers the first question. Use an array_map.

It also helps to answer the special question: change the function get_option_output accordingly.

If priority is the regular expression that should be used (and all the others should be ignored), then do something like this (also check if the priority is valid):

function get_option_output($option, $products)
{
    $priority = (int)$option[6];
    $found = find(
        $products,
        fn($product) => preg_match(
            $option[3 + $priority],
            $product[$priority]
        )
    );
    return $found ? $option[7] : 'Not available';
}

If the one with the highest priority should be checked first, and the others should also be checked:

function some($array, $callback)
{
    foreach ($array as $index => $item) {
        if ($callback($item, $index)) {
            return true;
        }
    }
    return false;
}

function get_option_output($option, $products)
{
    $priority = (int)$option[6];
    $found = some(
        $products,
        fn($product) =>
            preg_match($option[3 + $priority], $product[$priority])
            || some(
                $product,
                fn($text, $index) =>
                    $index !== $priority
                    && preg_match($option[3 + $index], $product[$index])
            )
    );
    return $found ? $option[7] : 'Not available';
}

If I didn't understand the details and something is missing, nevertheless probably what was provided might help.

Added: unique_multidimensional_array reimplementation

function unique_multidimensional_array($array, $key) {
    $valuesByKey = [];
    foreach($array as $value) {
        $elementsByKey[$value[$key]] = $value;
    }
    return array_values($valuesByKey);
}
Pedro Amaral Couto
  • 2,056
  • 1
  • 13
  • 15
  • This is a lot for me to review, but I don't understand `empty($data[0]) && empty($data[1])` inside of `get_option_output()`. Is this a typo? – mickmackusa Aug 20 '22 at 03:33
  • It should be `empty($product[0]) && empty($product[1])`. I changed it. The code has 976 characters. Variant A, that you provided, has 2,765 characters (excluding the initial array). My answer has 6,504 characters. The question has 11,851 characters. – Pedro Amaral Couto Aug 20 '22 at 10:57
  • Yes. It is a lot of characters in any version. Hard to analyze on my phone. Thanks for editing. – mickmackusa Aug 20 '22 at 11:11
  • It essentially maps each array element. For the "output" key, it evaluates some rules (has some X that follows the conditions C) to decide if it's "Not available" or `$option[7]`. That's it. Now I noticed I repeated `preg_match($option[5], $product[2])`. You don't need that. I'm editing again. – Pedro Amaral Couto Aug 20 '22 at 12:58
  • If it's available, then `preg_match($option[5], $product[2])` must be true. Also `preg_match($option[3], $product[0])` must be true, or `preg_match($option[4], $product[1])` must be true, or `empty($product[0]) && empty($product[1])` must be true. I used PHP 7.4 features (`fn`, `??`, …). And some functional programming (`array_map`, `some` implementation, ...). – Pedro Amaral Couto Aug 20 '22 at 13:05
  • 1
    Dear All, thank you very much for your answers (especcially Pedro Amaral Couto!). Next days I will test your suggestions and will provide my results and feedbacks. – Clipart - Designer Aug 21 '22 at 12:15
  • Hello dear All, I tested whithin the first step the folowing part of the suggested code `function some($array, $callback)... ...$ct_product_options);` It works really fast and provides correct results! Thanks! The problem is only with multiple regex (e. g. '/.window$ | .wide windows./' does not work. Only single rows '/.window$/' and /.wide windows./' work. To be honest I cannot understand it. In general preg_mutch usually works well with multiple regex. P. S. Suggestion regarding implementation of the logic for priority I will test a little bit later. – Clipart - Designer Aug 23 '22 at 21:01
  • P. S. After more tests I realized that the code `function some($array, $callback)... ...$ct_product_options);` provides duplicates within the results. That means I need to use an additional function like my function unique_multidimensional_array in order to delete duplicates (?) – Clipart - Designer Aug 23 '22 at 21:31
  • It's weird the `array_map` function returning duplicates, unless `$ct_product_options` has duplicates. It's supposed to return the exact number of elements of the array. It replaces each element in the array with something else. It would be helpful to know what is the input you used. – Pedro Amaral Couto Aug 23 '22 at 23:07
  • As mentioned within target description _"The search strings can be defined many times ( e. g. 10 different search strings for the option "Windows" like "windows in the floor", "big windows", "window without glas" etc.)."_ That means table $ct_product_options has many dublicates, at least 3 entries/rules per option, if I would consolidate regex by use of "|". That is due to the fact that the same option can be identificated by 3 different constelations (see description above). Even the table $input_product_data by some items can have dublicate descriptions . – Clipart - Designer Aug 24 '22 at 05:09
  • Whatever it's in `$ct_product_options` can't make `map_array` generate duplicates. Only `$ct_product_options` (used as an argument) can allow elements with the same ID ("duplicates"). I believe `unique_multidimensional_array` is overcomplicated. You only need the elements by ID and return the values. – Pedro Amaral Couto Aug 24 '22 at 10:51