0

I would like to change this using preg_match:

<li class="fte_newsarchivelistleft" style="clear: both; padding-left:0px;"><a class="fte_standardlink fte_edit" href="news,2480143,3-kolejka-sezonu-2014-2015.html">3 kolejka sezonu 2014/2015&nbsp;&raquo;&raquo;</a></li>
                      <li class="fte_newsarchivelistright" style="height: 25px;">komentarzy: <span class="fte_standardlink">[0]</span></li>

To this:

news,2480143,3-kolejka-sezonu-2014-2015.html

How can I do it? I'm trying with preg_match but that link is too complicated...

user3898993
  • 21
  • 1
  • 8

1 Answers1

0

Using preg_match would indeed be too complicated. As stated on this site many times before: regex + HTML don't mix well. Regex is not suitable to process markup. A DOM parser, however is:

$dom = new DOMDocument;//create parser
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);//create XPath instance for dom, so we can query using xpath
$elemsWithHref = $xpath->query('//*[@href]');//get any node that has an href attribtue
$hrefs = array();//all href values
foreach ($elemsWithHref as $node)
{
    $hrefs[] = $node->getAttributeNode('href')->value;//assign values
}

After this, it's a simple matter of processing the values in $hrefs, which will be an array of strings, each of which are the value of a href attribute.

Another example of using DOM parsers and XPath (to show you what it can do): can be found here

To replace the nodes with the href values, it's a simple matter of:

  • Getting the parent node
  • constructing a text-node
  • calling DOMDocument::replaceChild
  • Finnishing up by calling save to write to a file, or saveHTML or saveXML to get the DOM as a string

An example:

$dom = new DOMDocument;//create parser
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);//create XPath instance for dom, so we can query using xpath
$elemsWithHref = $xpath->query('//*[@href]');//get any node that has an href attribtue
foreach ($elemsWithHref as $node)
{
    $parent = $node->parentNode;
    $replace = new DOMText($node->getAttributeNode('href')->value);//create text node
    $parent->replaceChild($replace, $node);//replaces $node with $replace textNode
}
$newString = $dom->saveHTML();
Community
  • 1
  • 1
Elias Van Ootegem
  • 74,482
  • 9
  • 111
  • 149
  • @user3898993: If ever you think of regex when processing markup, just remember: [it summons Cthulhu](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454)... it's a kind of legendary answer here :) – Elias Van Ootegem Sep 01 '14 at 15:21