0

Is there any good solution for me to convert an html of following format

            <span xmlns:v="http://rdf.data-vocabulary.org/#">
                <span typeof="v:Breadcrumb">
                    <a href="http://link1.com/" rel="v:url" property="v:title">Home</a>
                </span> 
                / 
                <span typeof="v:Breadcrumb">
                    <a href="http://link2.com/" rel="v:url" property="v:title">Child 2</a>
                </span>
                / 
                <span typeof="v:Breadcrumb">
                    <a href="http://link3.com/" rel="v:url" property="v:title">Child 3</a>
                </span> 
                / 
                <span typeof="v:Breadcrumb">
                    <span class="breadcrumb_last" property="v:title">Child 4</span>
                </span>
            </span>

into

            <span itemscope="" itemtype="http://data-vocabulary.org/Breadcrumb">
                <span typeof="v:Breadcrumb">
                    <a href="http://link1.com/" itemprop="url">
                        <span itemprop="title">Home</span>
                    </a>
                </span> 
                /
                <span typeof="v:Breadcrumb">
                    <a href="http://link2.com/" itemprop="url">
                        <span itemprop="title">Child 2</span>
                    </a>
                </span> 
                / 
                <span typeof="v:Breadcrumb">
                    <a href="http://link3.com/" itemprop="url">
                        <span itemprop="title">Child 3</span>
                    </a>
                </span> 
                / 
                <span>
                    <span class="breadcrumb_last">
                        <span itemprop="title">Child 4</span>
                    </span>
                </span>
            </span>

with php? I want to convert a bread crump structure in RDFa to Microdata. Thank you for the help

Shoe
  • 74,840
  • 36
  • 166
  • 272
rahul Ram
  • 91
  • 1
  • 1
  • 6
  • 3
    Please refrain from parsing HTML with RegEx as it will [drive you į̷̷͚̤̤̖̱̦͍͗̒̈̅̄̎n̨͖͓̹͍͎͔͈̝̲͐ͪ͛̃̄͛ṣ̷̵̞̦ͤ̅̉̋ͪ͑͛ͥ͜a̷̘͖̮͔͎͛̇̏̒͆̆͘n͇͔̤̼͙̩͖̭ͤ͋̉͌͟eͥ͒͆ͧͨ̽͞҉̹͍̳̻͢](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Use an [HTML parser](http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php) instead. – Madara's Ghost Apr 08 '13 at 20:18
  • @MadaraUchiha : the RDFa data is obtained inside wordpress itself. So I cant load an html parser just for this purpose alone. – rahul Ram Apr 08 '13 at 20:19
  • 1
    Why would that stop you using an HTML parser? [`DOMDocument`](http://php.net/domdocument), job done. – DaveRandom Apr 08 '13 at 20:20
  • @rahulRam: If you get a string, you can very much load an HTML parser. `DOMDocument` is included with any decently updated version of PHP. – Madara's Ghost Apr 08 '13 at 20:20
  • The text should be replaceable with `str_replace` alone... – bwoebi Apr 08 '13 at 20:21
  • **Don't use regular expressions to parse HTML**. You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged. – Andy Lester Apr 08 '13 at 20:41

1 Answers1

1

The solution with regexp, this works with your example code, but when the attribute order changes it fails:

 $pattern = '#(?:rel\=\"v\:url\"\)? property\=\"v\:title\"\>([^\<]*)\<#ui';
 $replacement = ' itemprop="url"><span itemprop="title">$1</span><';
 $output = preg_replace($pattern,$replacement,$original);

If it is possible, always think of HTML/XML parsing when you want to manipulate HTML/XML source, here is a powerful tool: https://code.google.com/p/phpquery/. If you use jQuery js framework, this tool will be easy for you ;) See:

require_once 'phpquery/phpQuery.php';
$dom = phpQuery::newDocument($original);
foreach($dom->find('a[rel="v:url"]') as &$item){
    $txt = $this->text();
    $item->
       removeAttr('rel')->
       removeAttr('property')->
       attr('itemprop','url')->
       html("<span itemprop=\"title\">$txt</span>");        
}
$output = "$original";
Kovge
  • 2,019
  • 1
  • 14
  • 13