0

I have a problem, I am wanting to do the following but I cannot work out how preg_replace can do it. I have only ever been able to add something not read then add behind.

I have a site with 300,000+ pages and we are trying to make an anchor text link side bar. But first I need to add an ID to all my h2 tags so I have for example <h2>this is the title</h2> and I need PHP to automatically on page render output <h2 id="this-is-the-title">This is the title</h2>

Unfortunatelly all attempts have failed. I have tried google but it's a hard one to search for as I am not exactly sure what it's called.

Any ideas on what this is called or code snippets?

Brodey Sheppard
  • 95
  • 1
  • 1
  • 8
  • If you have *300,000+ pages* actual files then your doing something wrong. Also most IDE's have a search and replace function, if you need to change it in the source code. – Rizier123 Mar 14 '16 at 01:53
  • 300,000+ database pages, there all fetched from a DB, I cannot simply do a search and replace to a database. – Brodey Sheppard Mar 14 '16 at 01:56
  • Okay, then you should be able to do just this: `$new = str_replace("

    ", "

    ", $dataFromDB);`

    – Rizier123 Mar 14 '16 at 01:57
  • You want to add an ID to every H2 tag that is equal to the contents of the H2 tag itself? Do you have to do this with PHP? It seems you might have better luck doing this with jQuery. – ThrowBackDewd Mar 14 '16 at 01:59
  • Rizier123 thanks but this won't work as the id is dynamic not static it needs to actually know what is inside each h2 tag as in read inside first then do a str_replace. ThrowBackDewd yes this is exactly what I want done and yes needs to be PHP – Brodey Sheppard Mar 14 '16 at 02:04
  • is "this is the title" coming from the DB to start with? If so, you could still do something like Rizier stated. `$newString = str_replace("

    ", "

    , $oldh2string)`. It's hard to know if this will work for you though as I'm not sure how you're creating the original h2 string, in other words, how much of it is hard coded vs what's dynamic.

    – ThrowBackDewd Mar 14 '16 at 02:11
  • It is but it's not a variable being outputted into content, it's withing a $content variable within a thousand word article.. – Brodey Sheppard Mar 14 '16 at 03:01

1 Answers1

1

The rule is — as usual — no regular expression with HTML. Use DOMDocument for this.

First create a function to generate the title-based id:

function value2id( $text )
{
    $retval = preg_replace( '/ +/', '-', trim( $text ) );
    if( preg_match( '/^[^a-z]/i', $retval ) ) $retval = "a$retval";
    return $retval;
}

Above function will return an id HTML5 compatible. If your HTML code version is lower, there are more restriction to allowed characters in id. You can modify the function as you prefer.

Then, load your entire old page (I don't know if in db you have the complete code or only the <body>) in a DOMDocument object, search for all <h2> elements and add id attribute calling the custom function:

$dom = new DomDocument();
libxml_use_internal_errors(1);
$dom->loadHTML( $html );

foreach( $dom->getElementsByTagName( 'h2' ) as $h2 )
{
    $h2->setAttribute( 'id', value2id( $h2->nodeValue ) );
}

Now, you can print your modified HTML by:

echo $dom->saveHTML();
Community
  • 1
  • 1
fusion3k
  • 11,568
  • 4
  • 25
  • 47