-1

Possible Duplicate:
How to parse and process HTML with PHP?

I have a bunch of code I need to search trough and get this kind of data

<span class="parameter-name-value">
    <span class="parameter-name">....</span> 
    <span class="parameter-value">....</span>
</span>

into a PHP array in a format of

$array = array(
    array("parameter-name", "parameter-value"),
    array("parameter-name", "parameter-value"),
    array("parameter-name", "parameter-value")
)

What kind of regular expression do I need?

Community
  • 1
  • 1
MikkoP
  • 4,864
  • 16
  • 58
  • 106

3 Answers3

1

This will be your setup:

function get_tags($string, $start, $end)
{
    $start = str_replace("\\", "\\\\", $start);
    $start = str_replace("/", "\/", $start);
    $end   = str_replace("\\", "\\\\", $end);
    $end   = str_replace("/", "\/", $end);
    preg_match_all("/{$start}(.*?){$end}/si", $string, $matching_data);
    return $matching_data[0];
}

function return_between($string, $start, $stop, $type)
{
    $temp = split_string($string, $start, false, $type);
    return split_string($temp, $stop, true, $type);
}

function get_attribute($tag, $attribute)
{   
    // Remove all line feeds from the string
    $cleaned_html = str_replace("\r", "", $tag);   
    $cleaned_html = str_replace("\n", "", $cleaned_html);

    // Use return_between() to find the properly quoted value for the attribute
    return return_between($cleaned_html, $attribute."=\"", "\"", true);
}

To use it, something like this:

$open_tag = '<span';
$close_tag = '>';

$span_tags = get_tags($html_string, $open_tag, $close_tag);
$span_tag_class_names = array();

foreach ($span_tags as $key => $tag) {
    $class_name = get_attribute($tag, $attribute = "class");
    if (!empty($class_name)) {
        $span_tag_class_names[] = $class_name;
    }
}

print_r($span_tag_class_names);

As with all regex, your mileage may vary.

jdstankosky
  • 657
  • 3
  • 15
1

If you know your data is going to look exactly as you've presented, and that it'll never change, then using regular expressions is both faster and easier than loading an XML library. But keep in mind that not much has to change in order for this to fail. An XML parser-solution is much more robust.

$data = '<span class="parameter-name-value">
    <span class="parameter-name">A</span>
    <span class="parameter-value">x</span>
</span>
<span class="parameter-name-value">
    <span class="parameter-name">B</span>
    <span class="parameter-value">y</span>
</span>
<span class="parameter-name-value">
    <span class="parameter-name">C</span>
    <span class="parameter-value">z</span>
</span>
';

$pattern = '@<span class=\"parameter-name-value\">
    <span class=\"parameter-name\">(.*)</span>
    <span class=\"parameter-value\">(.*)</span>
</span>@';

preg_match_all($pattern, $data, $matches);
list($_, $keys, $values) = $matches;
$result = array_combine($keys, $values);
print_r($result);

Output

Array
(
    [A] => x
    [B] => y
    [C] => z
)
kba
  • 19,333
  • 5
  • 62
  • 89
  • Clean +1. But I'd use `\s*` between the tags for more flexible matching and use `(.*?)` (with lazy quantifier) for the captures and single-line `s` modifier to allow newlines in the values. – ridgerunner Oct 25 '12 at 15:19
  • Sorry, but this doesn't work. The spans aren't listed directly after each other. – MikkoP Oct 25 '12 at 16:44
  • @MikkoP Can you show an example of where it doesn't work? Putting anything inbetween shouldn't be a problem. – kba Oct 25 '12 at 16:53
0

Your HTML is not clear but can use DOMDocument no matter the format

$html = '<span class="parameter-name-value">
    <span class="parameter-name">A</span> 
    <span class="parameter-value">1</span>
</span>
<span class="parameter-name-value">
    <span class="parameter-name">B</span> 
    <span class="parameter-value">2</span>
</span>';

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
$span = $xpath->query("//span/span");

    $dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
$span = $xpath->query("//span/span");

$list = array();
$list2 = array();

for($i = 0; $i < $span->length; $i += 2) {
    $name = $span->item($i);
    $value = $span->item($i + 1);
    $list[] = array($name->getAttribute('class') => $name->nodeValue,$value->getAttribute('class') => $value->nodeValue);
    $list2[] = array($name->getAttribute('class'),$value->getAttribute('class'));
}

var_dump($list);
var_dump($list2);

Output $list

array
  0 => 
    array
      'parameter-name' => string 'A' (length=1)
      'parameter-value' => string '1' (length=1)
  1 => 
    array
      'parameter-name' => string 'B' (length=1)
      'parameter-value' => string '2' (length=1)

Output $list2

array
  0 => 
    array
      0 => string 'parameter-name' (length=14)
      1 => string 'parameter-value' (length=15)
  1 => 
    array
      0 => string 'parameter-name' (length=14)
      1 => string 'parameter-value' (length=15)
Baba
  • 94,024
  • 28
  • 166
  • 217
  • Great, but what is the `getAttribute` function? `Fatal error: Call to a member function getAttribute() on a non-object` – MikkoP Oct 25 '12 at 16:44
  • Worked with the XML you gave me http://codepad.viper-7.com/ppUOyf it works – Baba Oct 25 '12 at 16:59