0

I have to extract array from html block , specifically just the tools (without the word tools and the match need to be no greedy), the problem is the html block takes two forms, once it's like this:

<p>tools :<br>
 1 Hex Key (or two Hex key)<br>
 1 screww drivers<br>
 1 hammer <br>
 1 drill </p>

and the second form is like this:

<p>"tools :
 1 Hex Key (or two Hex key)
 1 screww drivers
 1 hammer 
 1 drill "</p>

i try with this regex , but i don't succeed:

  $tools = "<p>tools :<br>
  1 Hex Key (or two Hex key)<br>
  1 screww drivers<br>
  1 hammer <br>
  1 drill </p>"

 $tools_array = preg_match_all('#<p>tools:([^<>]*<br\s*/?>[^<>]*)+</p>#s', 
 $tools);

Any idea ?

3 Answers3

0

This regex should do what you want:

^(?!\s*<p>tools\s*:)\s*(.*?)(?=\s*(<br|</p|$))

It looks for a line which does not start with <p>tools:, then matches characters (using a non-greedy match) until it sees one of <br, </p, or the end of the line. The match is returned in group 1. We use the m flag to allow ^ to match start of line in a multi-line string.

In PHP:

preg_match_all('#^(?!\s*<p>tools\s*:)\s*(.*?)(?=\s*(<br|</p|$))#m', $tools, $tools_array);
print_r($tools_array[1]);

Output:

Array (
    [0] => 1 Hex Key (or two Hex key)
    [1] => 1 screww drivers
    [2] => 1 hammer
    [3] => 1 drill 
)

Demo on 3v4l.org

Nick
  • 138,499
  • 22
  • 57
  • 95
0

try this:

$tools = explode("\n",$tools);

the result is :

Array
 (
      [0] => <p>tools :
      [1] =>   1 Hex Key (or two Hex key)<br>
      [2] =>   1 screww drivers<br>
      [3] =>   1 hammer <br>
      [4] =>   1 drill </p>
  )

then unset the first element and delete every tag in another elements.

unset($tools[0]);
$tools= array_map(function($val){return strip_tags($val);},$tools);

the result is :

 Array
 (
      [1] =>   1 Hex Key (or two Hex key)
      [2] =>   1 screww drivers
      [3] =>   1 hammer 
      [4] =>   1 drill
 )
0

Another approach.... Without using RegEX

snippet

$tools = "<p>tools :<br>
  1 Hex Key (or two Hex key)<br>
  1 screww drivers<br>
  1 hammer <br>
  1 drill </p>";
  $search = ['<p>','</p>','<br>', 'tools :']; //Add more words to be removed
  $filteredStr = str_replace($search, '',$tools);
  $res = explode(PHP_EOL,$filteredStr);
  array_shift($res); // Removing empty element at the beginning of array
  print_r($res);

Output

Array
(
    [0] =>   1 Hex Key (or two Hex key)
    [1] =>   1 screww drivers
    [2] =>   1 hammer 
    [3] =>   1 drill 
)

Live Demo

Reference
str_replace
explode
array_shift

Shahnawaz Kadari
  • 1,423
  • 1
  • 12
  • 20