The target is to check a product description and to identify different characteristics/product options. The input data has the following structure:
// TABLE WITH INPUT DATA. STRUCTURE: PRODUCT_CATEGORY [0], PRODUCT_NUMBER[1], DESCRIPTION OF AN OPTION [2]. THE INPUT DATA TABLE CAN CONSIST OF UP TO 400-500 ROWS
$input_product_data = array (
array('AAAA','1111','Chimney with red bricks in the center of the room'),
array('BBBB','2222','Two wide windows in the main floor'),
array('BBBB','2233','Plastic window has to be changed later'),
array('CCCC','3333','Roof tiles renewed in 2015'),
array('NULL','4444','Floor has been renovated for two years. Currently it has ground in wood.'),
array('NULL','NULL','Beautiful door in green color built at begin of 20th century')
);
There are 3 different constelations to indicate a product option:
- Only by search string within product description,
Example:
Input data: array('NULL','NULL','Beautiful door in green color built at begin of 20th century')
Search string: 'green color' within PRODUCT_DESCRIPTION
Result: Available
- By search string within product description + product category:
Example:
Input data: array('CCCC','NULL','Roof tiles renewed in 2015'),
Search strings: 'CCCC' within PRODUCT_CATEGORY + 'green color' within PRODUCT_DESCRIPTION
Result: Available
- By search string within product description + product category + product number.
Example:
Input data: array('AAAA','1111','Chimney with red bricks in the center of the room')
Search strings: 'AAAA' within PRODUCT_CATEGORY + '1111' within PRODUCT_NUMBER + 'Chimney' within PRODUCT_DESCRIPTION
Result: Available
IMPORTANT:
- The table with input data per product can consist of up to 450 description rows.
- The search strings can be defined many times ( e. g. 10 different search strings for the option "Windows" like "windows in the floor", "big windows", "window without glas" etc.).
- The start set of rules (combinations of product description + product category + product number) will consist of ca. 3000 rows and will be extented permanently by business guys.
REALIZATION VARIANT A (by use of preg_match):
// TABLE FOR PRODUCT OPTIONS. STRUCTURE: ID[0], OPTION NAME[1], OPTION CATEGORY[2], OPTION-FAMILY[3], PROD.-NR[4], REG. EXPRESSION[5], PRIORITY[6], OUTPUT[7]
$ct_product_options = array (
array('0001', 'Chimney', 'Additional options', '/^AAAA/', '/9999/', '/^Chimney with./', '0', 'Available'),
array('0002', 'Material of ground floor', 'Additional options', '/NULL/', '/^4444$/', '/.wood./', '0', 'Wood'),
array('0003', 'Roof tiles', 'Basic options', '/^CCCC/', '/0022/', '/^Roof tiles./', '0', 'Available'),
array('0004', 'Windows', 'Basic options', '/^B...$/', '/^2.../', '/.window$/', '0', 'Available'),
array('0004', 'Windows', 'Basic options', '/^B...$/', '/^2.../', '/.wide windows./', '0', 'Available'),
array('0005', 'Door color', 'Basic options', '/NULL/', '/NULL/', '/green/', '0', 'Green'),
array('0006', 'Air condition', 'Additional options', '/NULL/', '/NULL/', '/^Air condition made in Japan/', '0', 'Green')
);
// FOR LOOP TO MAKE COMPARISON BETWEEN INPUT PRODUCT DATA AND PREDEFINED CUST. STRINGS
$matches_array = array();
foreach ($input_product_data as [$product_family, $product_number, $product_description]) {
foreach($ct_product_options as [$option_id, $option_name, $option_category, $product_family_reg_exp, $product_number_reg_exp, $regular_expression, $priority, $output]) {
if (preg_match($regular_expression, $product_description) == 1
&& preg_match($product_family_reg_exp, $product_family) == 1 ||
preg_match($regular_expression, $product_description) == 1
&& preg_match($product_number_reg_exp, $product_number) == 1) {
$matches_array [] = array("id" => $option_id, "option_name" => $option_name, "option_category" => $option_category, "output"=> $output);
}
else {
if (empty($product_family) && empty($product_number)) {
if (preg_match($regular_expression, $product_description) == 1) {
$matches_array [] = array("id" => $option_id, "option_name" => $option_name, "option_category" => $option_category, "output"=> $output);
}
}
}
}
}
//echo "<pre>";
//print_r($matches_array);
// FUNCTION FOR DELETE DUBLICATES FROM ARRAY WITH MATCHES
function unique_multidimensional_array($array, $key) {
$temp_array = array();
$i = 0;
$key_array = array();
foreach($array as $val) {
if (!in_array($val[$key], $key_array)) {
$key_array[$i] = $val[$key];
$temp_array[$i] = $val;
}
$i++;
}
return $temp_array;
}
//echo "<br><h3>UNIQUE MATCHES</h3>";
// CALL OF THE FUNCTION TO GET UNIQUE MATCHES
$unique_matches = unique_multidimensional_array($matches_array, 'id');
sort($unique_matches);
//echo "<pre>";
//print_r($unique_matches);
// CALL OF THE FUNCTION TO CREATE LIST/ARRAY WITH ALL AVAILABLE PRODUCT OPTIONS
$list_all_product_options = unique_multidimensional_array($ct_product_options, 0);
$list_all_product_options_short = array();
foreach ($list_all_product_options as $option_item) {
$list_all_product_options_short[] = array("id" => $option_item[0], "option_name" => $option_item[1], "option_category" => $option_item[2]);
}
sort($list_all_product_options_short);
//echo "<h3>LIST WITH ALL PRODUCT OPTIONS (SHORT VERSION)</h3>\n";
//echo "<pre>";
//print_r($list_all_product_options_short);
$unique_matches = array_column($unique_matches, null, 'id');
foreach ($list_all_product_options_short as $key => $value) {
if (isset($unique_matches[$value['id']])) {
$result[$key] = array_merge($value, $unique_matches[$value['id']]);
} else {
$result[$key] = array_merge($value, ['output' => 'Not available']);
}
}
echo "<h3>FINAL RESULTS</h3>\n";
//echo "<pre><br>\n";
print_r($result);
The variant realized with preg_match works well and provide quite good flexibilty by defining of the regex. E. g. Instead to define the whole product number "2222" I can use only "/^2.../". Or I can combine many regex within one row by use of "|" (e. g. ".wide windows. | some window | etc."). The problem is that by real data volume 500 rows within $input_product_data and 3000 rows within $ct_product_options the code is quite slow.
REALIZATION VARIANT B (by use of stripos):
// INPUT DATA WITH PRODUCT DESCRIPTION. STRUCTURE: PROD. FAMILY, PROD. NUMBER, PRODUCT DESCRIPTION
$input_product_data = array (
array('AAAA','1111','Chimney with red bricks in the center of the room'),
array('BBBB','2222','Two wide windows in the main floor'),
array('BBBB','2233','Plastic window has to be changed later'),
array('CCCC','3333','Roof tiles renewed in 2015'),
array('NULL','4444','Floor has been renovated for two years. Currently it has ground in wood.'),
array('NULL','NULL','Beautiful door in green color built at begin of 20th century')
);
// CUSTOMIZING TABLE FOR PRODUCT OPTIONS. STRUCTURE: ID[0], OPTION NAME[1], OPTION CATEGORY[2], OPTION-FAMILY[3], PROD.-NR[4], REG. EXPRESSION[5], PRIORITY[6], OUTPUT[7]
$ct_product_options = array (
array('0001', 'Chimney', 'Additional options', 'AAAA', '9999', 'Chimney with', '0', 'Available'),
array('0002', 'Material of ground floor', 'Additional options', 'NULL', '4444', 'wood', '0', 'Wood'),
array('0003', 'Roof tiles', 'Basic options', 'CCCC', '0022', 'Roof tiles', '0', 'Available'),
array('0004', 'Windows', 'Basic options', 'BBBB', '2222', 'window', '0', 'Available'),
array('0004', 'Windows', 'Basic options', 'BBBB', '2222', 'wide windows', '0', 'Available'),
array('0005', 'Door color', 'Basic options', 'NULL', 'NULL', 'green', '0', 'Green'),
array('0006', 'Air condition', 'Additional options', 'NULL', 'NULL', 'Air condition made in Japan', '0', 'Green')
);
// IMPORTANT: THE REG. EXPRESSIONS CAN BE DEFINED MANY TIME (e. g. 10 DIFFERENT REG: EXPRESSIONS FOR WINDOW). POINTS "." REPRESENTS EMPTY SPACES WHICH ARE IMPORTANT TO INDETIFY EXACTLY AN OPTION.
// FOR LOOP TO MAKE COMPARISON BETWEEN INPUT PRODUCT DATA AND PREDEFINED CUST. STRINGS
$matches_array = array();
foreach ($input_product_data as [$product_family, $product_number, $product_description]) {
foreach($ct_product_options as [$option_id, $option_name, $option_category, $product_family_reg_exp, $product_number_reg_exp, $regular_expression, $priority, $output]) {
if (stripos($product_description, $regular_expression) !== false
&& stripos($product_family, $product_family_reg_exp) !== false ||
stripos($product_description, $regular_expression) !== false
&& stripos($product_number, $product_number_reg_exp) !== false) {
$matches_array [] = array("id" => $option_id, "option_name" => $option_name, "option_category" => $option_category, "output"=> $output);
}
else {
if (empty($product_family) && empty($product_number)) {
if (stripos($product_description, $regular_expression) !== false) {
$matches_array [] = array("id" => $option_id, "option_name" => $option_name, "option_category" => $option_category, "output"=> $output);
}
}
}
}
}
//echo "<pre>";
//print_r($matches_array);
// FUNCTION FOR DELETE DUBLICATES FROM ARRAY WITH MATCHES
function unique_multidimensional_array($array, $key) {
$temp_array = array();
$i = 0;
$key_array = array();
foreach($array as $val) {
if (!in_array($val[$key], $key_array)) {
$key_array[$i] = $val[$key];
$temp_array[$i] = $val;
}
$i++;
}
return $temp_array;
}
//echo "<br><h3>UNIQUE MATCHES</h3>";
// CALL OF THE FUNCTION TO GET UNIQUE MATCHES
$unique_matches = unique_multidimensional_array($matches_array, 'id');
sort($unique_matches);
//echo "<pre>";
//print_r($unique_matches);
// CALL OF THE FUNCTION TO CREATE LIST/ARRAY WITH ALL AVAILABLE PRODUCT OPTIONS
$list_all_product_options = unique_multidimensional_array($ct_product_options, 0);
$list_all_product_options_short = array();
foreach ($list_all_product_options as $option_item) {
$list_all_product_options_short[] = array("id" => $option_item[0], "option_name" => $option_item[1], "option_category" => $option_item[2]);
}
sort($list_all_product_options_short);
//echo "<h3>LIST WITH ALL PRODUCT OPTIONS (SHORT VERSION)</h3>\n";
//echo "<pre>";
//print_r($list_all_product_options_short);
// ::::::::::::::::::::::::::::::::::
$unique_matches = array_column($unique_matches, null, 'id');
foreach ($list_all_product_options_short as $key => $value) {
if (isset($unique_matches[$value['id']])) {
$result[$key] = array_merge($value, $unique_matches[$value['id']]);
} else {
$result[$key] = array_merge($value, ['output' => 'Not available']);
}
}
echo "<h3>FINAL RESULTS</h3>\n";
//echo "<pre><br>\n";
print_r($result);
It works much faster, but does not provide the felixibility of regex.
So, my questions:
Do you see any ways to optimize VARIANT A to get it faster or optimize VARIANT B to get it more flexible?
Especial question: How I can add the logic for the parameter PRIORITY from the table $ct_product_options?
The business logic is the following for it: As default all rows/rules have priority "0". But some of them will get priority ">0" (e. g. "1" or "2" etc.). The rule with highest priority should overwrite other rules.
E. g.
This rule with priority "0" identified windows in the house.
array('0004', 'Windows', 'Basic options', '/^B...$/', '/^2.../', '/.wide windows./', '0', 'Available')
At the same time this rule with priority "1" tells us that all windows are not available more. So, that means we have to get "Not available" within the final results.
array('0004', 'Windows', 'Basic options', '/^B...$/', '/^2.../', '/^Windows have been removed from the whole building last year/', '1', 'Not available')