2

Users can enter text for a piece of content using WYSIWYG which is placed into the variable $body. This may include multiple instances of style="[maybe stuff] height:xpx [maybe stuff]" or height="xpx".

I need to get all of the height values that exist (the numbers only) so that I can add them together.

Note there may be other integer values within the string so it can't just grab all integers.

If the solution uses regex, I have never been able to understand it and I understand there are security issues with regex, so ideally I'm looking for a safe solution!

I'm certain this must be quite simple but I'm struggling!

Florent
  • 12,310
  • 10
  • 49
  • 58
Rob
  • 77
  • 7
  • 1
    There is nothing fundamentally wrong with using regular expressions for this. See e.g [Parse a CSS file with PHP](http://stackoverflow.com/q/3618381) – Pekka Oct 22 '12 at 09:56
  • what you have tried ? Post some code !! – Pratik Oct 22 '12 at 09:57

3 Answers3

3

This should do the trick, if I'm not mistaken:

preg_match_all('/height(\:|\=)"*\s*([0-9]+[^;"]+);*/i','<tr style="height: 123px; border: none;><tr height="125px"',$matches);
var_dump($matches[2]);//array('123px','125px');

But since you're going to let this regex loose on HTML (if I'm not mistaken), I'd look at ways to parse the DOM and use the DOMElement's methods to get what I want. It's a far more robust take on the problem.

As requested by OP:

function getDeepChildren($node,&$nodeArray)
{//recursive function to flatten dom
    $current = $node->getElementsByTagName('*');//get all children
    foreach($current as $node)
    {//loop through children
        $nodeArray[] = $node;//add child
        if ($node->hasChildNodes())
        {//if child node has children of its own
            getDeepChildren($node,$nodeArray);//get the children and append to nodeArray
        }
    }
}//no return value, $nodeArray is passed by reference
$dom = new DOMDocument();
$dom->loadHTML($body);
$nodes = array();
getDeepChildren($dom,$nodes);//$nodes is passed by reference
$height = array();
while($node = array_shift($nodes))
{//$height[i][0] === height value, $height[i][1] is reference to node
    if ($node->hasAttribute('height'))
    {
        $height[] = array($node->getAttribute('height'),$node);
        continue;//already got what we need, no need for slow preg_match
        //in case of <div height="123px" style="border:1px solid #F00;"> for example...
    }
    if ($node->hasAttribute('style') && preg_match('/height\s*\:\s*([0-9]+\s*[a-z]+)\s*;/i',$node->getAttribute('style'),$match))
    {
        $height[] = array($match[1],$node);
    }
}
var_dump($height);//should contain everything you're looking for

For a more OO approach, I suggest looking at a couple of recursive domnode iterator classes.
Passing arrays by reference is discouraged, but it's the easiest way to get what you need here. An alternative version would be:

function getDeepChildren($node)
{
    $nodes = array();
    $current = $node->getElementsByTagName('*');
    foreach($current as $node)
    {
        $nodes[] = $node;
        if ($node->hasChildNodes())
        {
            $nodes = array_merge($nodes,getDeepChildren($node));
        }
    }
    return $nodes;
}
//instead of getDeepChildren($dom,$nodes), usage is:
$nodes = getDeepChildren($dom);
Elias Van Ootegem
  • 74,482
  • 9
  • 111
  • 149
  • Thanks everyone for your help! Elias Van Ootegem - your regex worked perfectly, however I decided to take your advise on parsing the DOM. Answered below. – Rob Oct 22 '12 at 10:58
  • 1
    @user1173640: just a tip, instead of your snippet, traverse the elements and use `hasAttribute('height')` and `hasAttribute('style')`. In the case of a `height` attribute: `$node->getAttribute('height')` will give you what you need, else: `preg_match('/height\s*\:\s*([0-9]+[a-z])\s*;/i',$node->getAttribute('style'),$match)` will do the trick – Elias Van Ootegem Oct 22 '12 at 11:39
  • Thanks Elias, would this fix HamZa's comment re possible spaces that might exist? This is actually the first time I've ever parsed a dom like this (I'm getting there!) and I can't quite work out how to traverse the elements as you suggest. I'd be extremely grateful if you could type that out within my code below. Thanks again! – Rob Oct 22 '12 at 12:38
  • @user1173640: Updated my answer, That's the way I'd do things. And to address your concerns: yes, white-space in the style attribute of a tag is taken into consideration. Since JS has a pretty similar regex engine: `'foo style="bar:123; height : 123px ; width:125;"'.match(/height\s*\:\s*([0-9]+\s*[a-z]+)\s*;/i);` and `'foo style="bar:123; height:123px ; width:125;"'.match(/height\s*\:\s*([0-9]+\s*[a-z]+)\s*;/i);` both produce the exact same result, just try it in your console – Elias Van Ootegem Oct 22 '12 at 13:13
  • Thanks again Elias! I really appreciate it. Unfortunately no matter how much I've fiddled, searched, tweaked etc I've hit a brick wall. I've mostly been getting - Fatal error: Cannot redeclare getDeepChildren() (previously declared in /Applications/XAMPP/xamppfiles/htdocs/malt/sites/all/themes/maltblocks/templates/inject-content.tpl.php:29) in/Applications/XAMPP/xamppfiles/htdocs/malt/sites/all/themes/maltblocks/templates/inject-content.tpl.php on line 29. In the end I've decided to add in a hacky work around, which seeing as the use case is low anyway (and will be me doing it) shld be ok. – Rob Oct 24 '12 at 16:54
  • If you get that `redeclare` error that either means you're working in an OO context, and accessing the same method several times, each time attempting to redeclare the function, or you're using recursive calls _or_ you're including the `getDeepChildren` function definition multiple times. Setup a pastebin and I might be able to fix this problem for you – Elias Van Ootegem Oct 25 '12 at 12:31
2

Thanks everyone for your help! Elias Van Ootegem - your regex worked perfectly, however I decided to take your advise on parsing the DOM. This is the solution I found this way -

$dom = new DOMDocument();
$dom->loadHTML($body);
$xpath = new DOMXPath($dom);

  $tags = $xpath->query('//div/@style');
$height = 'height:';
$totalheight = 0;
foreach ($tags as $tag) {

$str = trim($tag->nodeValue);
$height_str = strstr( $str, $height);
$totalheight = $totalheight + trim( substr( $height_str, strlen( $height), stripos(        $height_str, 'px;') - strlen( $height)));

} 
Rob
  • 77
  • 7
  • What if the user types in "height : Xpx" (with spaces)? Exptect the unexpected ! – HamZa Oct 22 '12 at 11:35
  • Ok, commenting here might seem stalker-ish, but anyway: +1 for taking the time and effort to look into the parser-option. Since you've never done this, and nobody posted this link here, if you want to know _why_ you shouldn't use regex for this, and become a missionary for the enlightened elite who refuse to give in to _"the dark side"_, [this is a ravishing read](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Elias Van Ootegem Oct 22 '12 at 13:22
0

I'm not that familiar with regex, but maybe this will work?

<?php

$message = 'Hello world <p style="height: 80 px;width:20px">Some example</p><br />Second: DERP DERP <p style="color:#000;height:30 px;padding:10px;"> DERP</p>';
preg_match_all('#height\s?:\s?[0-9]+\s?px#', $message, $results);
$heights = str_replace(array('height', ':', ' ', 'px'), '', $results[0]);
echo array_sum($heights);

?>
HamZa
  • 14,671
  • 11
  • 54
  • 75