2

I set myself this rather simple sounding challenge but now I am stuck trying to figure out how to inject a classname onto the <body> dom element of my document.

The complexity is because I don't have control over the HTML markup I am getting via the file_get_contents function (a third party feeds the files via FTP).

So the body element could be a multitude of different ways, for example:

<body>
<body id="my-id" data-attribute="content">
<body data-attribute="content">
<body class="already-existing-class" id="my-id" data-attribute="content">

and so on… not even the order of said attributes is under my control so you may have a class= before the id= et cetera.

I think you all understand the complexity I am talking about here; (I hope).

What I basically need is a way to use preg_replace() to inject a new class into either an existing class attribute on the body (if one already exists) or add the class attribute itself with my new class in it.

Any help would be much appreciated.

If this has already been answered, please feel free to point it out. I tried searching but with such generic terms it was hard to find what I was looking for.

Thanks for reading.

J.

Jannis
  • 17,025
  • 18
  • 62
  • 75
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – grahamparks Dec 15 '11 at 01:26
  • Obligatory link to [The Answer](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) from @grahamparks linked question. – Stephen P Dec 15 '11 at 01:37
  • Triple-linking the same answer must be a record... – deceze Dec 15 '11 at 01:45
  • 1
    Love it. Triple linked means it must be right :) In all seriousness though, I appreciate all of you pointing out the error of my ways. I new to PHP so there's lots to be learned. Thanks. – Jannis Dec 15 '11 at 01:51

4 Answers4

3

To provide a close RegEx only solution, this works as long as extra spaces don't bother you ;-)

<?php

$pat = '/(<body) ?(([^>]*)class="([^"]*)")?/';
$inp = '<body>
<body id="my-id" data-attribute="content">
<body data-attribute="content">
<body class="already-existing-class" id="my-id" data-attribute="content">
<body id="my-id" data-attribute="content" class="abc">';

echo preg_replace($pat, '$1 $3 class="$4 new-class" ', $inp);

?>

Check ideone for the output.

Brigand
  • 84,529
  • 20
  • 165
  • 173
  • Thanks, this did work great however I am marking the HTML Parse answer as correct because it seems to be best practice to not use regexps for this sort of thing as I've now learned. – Jannis Dec 15 '11 at 02:31
  • @Jannis, of course. This was only meant as an alternate strategy/solution. I'd recommend the HTML parser as well. – Brigand Dec 15 '11 at 02:34
2

A regex can be extremely cumbersome for this application. Instead, I suggest you use an HTML parser, such as PHP's DOMDocument. Here is an example.

$node1 = '<body>';
$node2 = '<body id="my-id" data-attribute="content">';
$node3 = '<body data-attribute="content">';
$node4 = '<body class="already-existing-class" id="my-id" data-attribute="content">';

foreach( range( 1, 4) as $i)
{
    $var = 'node'.$i;
    $doc = new DOMDocument();
    $doc->loadHTML( $$var);
    foreach( $doc->getElementsByTagName( 'body') as $tag)
    {
        $tag->setAttribute('class', ($tag->hasAttribute('class') ? $tag->getAttribute('class') . ' ' : '') . 'some-new-class');
    }
    echo htmlentities( $doc->saveHTML()) . "\n";
}

Demo

Notice the output of the <body> tag is correct. You (or another SO member) are free to determine how to extract just the body tag from the DOMDocument.

nickb
  • 59,313
  • 13
  • 108
  • 143
  • Thanks @nickb, been trying to get this to work for the past hour (your example worked great but I also want to inject a bunch of html into the page which i cannot figure out) so I'm marking this as correct, because it works perfectly for my given question. Now back to trying to figure out how to insert a string of html into a page. – Jannis Dec 15 '11 at 02:30
1
$str = '<body>
<body id="my-id" data-attribute="content">
<body data-attribute="content">
<body class="already-existing-class" id="my-id" data-attribute="content">
';

$my_new_class = "HELLO_WORLD";
preg_match_all("/<body(.*?)>/is", $str, $m);
$s = sizeof($m[1]);
for($i=0; $i<$s; $i++){
    $m[1][$i] = preg_replace("/class=\"(.*?)\"/is", "class=\"".$my_new_class."\"", $m[1][$i]);
    if(!preg_match("/class=/is", $m[1][$i])){
        $m[1][$i] .= " class=\"".$my_new_class."\"";
    }
    $m[1][$i] = "<body".$m[1][$i].">";
}

print_r($m);
[1] => Array
    (
        [0] => <body class="HELLO_WORLD">
        [1] => <body id="my-id" data-attribute="content" class="HELLO_WORLD">

        [2] => <body data-attribute="content" class="HELLO_WORLD">
        [3] => <body class="HELLO_WORLD" id="my-id" data-attribute="content">
    )
jmp
  • 2,456
  • 3
  • 30
  • 47
  • 1
    `[3]` output should be `` to meet the OPs requirements. – Stephen P Dec 15 '11 at 01:42
  • Thanks, I appreciate your time putting this answer together, I'm going with the HTML Parser solution for now since it seems to be best practice to avoid regex and html. – Jannis Dec 15 '11 at 02:32
0

The regex should be changed as anything behind the class="" is missing

/(<ul) ?(([^>]*)class="([^"]*)"([^>]*))?/

Test code below. You can replace ul with the body tag

    <?php
    $pattern = '/(<ul) ?(([^>]*)class="([^"]*)"([^>]*))?/';
    $input_string = '<ul id="test" data-content="the content" class="children" data-compare="equal"><li> test</li></ul>';

    echo preg_replace($pattern, '$1 $3 class="$4 new-class" $5 ', $input_string);

    ?>

In the image you can see the content of each variable found ($1..$5)

Working RegEx expression to add a class name

Example can be tested here https://regex101.com/r/yjQe6G/1

backups
  • 517
  • 1
  • 6
  • 7