1

I have an XML that looks like this. I've loaded it into a string in PHP:

<sense>
<gloss>there</gloss>
<gloss>over there</gloss>
<gloss>that place</gloss>
<gloss>yonder</gloss>
</sense>
<sense>
<gloss>that far</gloss>
<gloss>that much</gloss>
<gloss>that point</gloss>
</sense>

I'm trying to format it to look like this:

<sense>
<gloss>there|over there|that place|yonder&that far|that much|that point</gloss>
</sense>

I've managed to almost do this with this code: (There's probably a smarter way to to this but still...)

preg_match_all('~<gloss>(.*)</gloss>~sU', $input, $matches);

$newStr = '';
//Add all new matches and put them in a new string
for ($i=0; isset($matches[1][$i]); $i++)
{
    $newStr .= $matches[1][$i].'|';
}

But how would I separate the two different sense fields with a "&" (or any separating mark)?

halfer
  • 19,824
  • 17
  • 99
  • 186
Harpo
  • 83
  • 4
  • 1
    Use simplexml or DOM, not regular expressions – zerkms Jan 13 '14 at 03:54
  • 1
    Trouble with your question is, you have selected an inappropriate tool for the job at hand. Trying to parse an XML-like input with regexps will send you to a world of pain for no gain. – kuroi neko Jan 13 '14 at 04:00
  • [Relevant](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). – Daedalus Jan 13 '14 at 04:03

3 Answers3

0

Make use of DOMDocument class. It's pretty easy!

[Also don't try to parse HTML with 'Regex'. It is not advisable].

<?php
$html='<sense>
<gloss>there</gloss>
<gloss>over there</gloss>
<gloss>that place</gloss>
<gloss>yonder</gloss>
</sense>
<sense>
<gloss>that far</gloss>
<gloss>that much</gloss>
<gloss>that point</gloss>
</sense>';
$dom = new DOMDocument;
@$dom->loadHTML($html);

foreach ($dom->getElementsByTagName('sense') as $tag) {
    foreach($tag->getElementsByTagName('gloss') as $intag )
    {
    $str.=$intag->nodeValue."|";
    }
    $str= rtrim($str,'|');
    $str.="&";

}

echo "<sense><gloss>".rtrim($str,'&')."</gloss></sense>";

Output

there|over there|that place|yonder&that far|that much|that point

If you view source, you can find this:

<sense><gloss>there|over there|that place|yonder&that far|that much|that point</gloss></sense>
halfer
  • 19,824
  • 17
  • 99
  • 186
Shankar Narayana Damodaran
  • 68,075
  • 43
  • 96
  • 126
  • `gloss` for different `sence` are supposed to be separated with `&` – zerkms Jan 13 '14 at 04:03
  • @zerkms. Good finding. I modified it now. Thanks. – Shankar Narayana Damodaran Jan 13 '14 at 06:13
  • Hm. I used your code exactly. I have my XML info in a php string, not an XML document. Because I need to format it more after I've done this. I loaded the string like this: $dom = new DOMDocument(); $dom->loadXML($input); But I get this error: DOMDocument::loadXML(): Entity 'n' not defined in Entity, line: 10 – Harpo Jan 13 '14 at 08:09
  • 1
    Sweet. Works like a charm after some messing around. Thank you! – Harpo Jan 13 '14 at 10:16
0

As comment by kuroi said, an xml library would probably be best for the job here. This probably isn't the most efficient code for this, but it's pretty straight forward and easy to use.

$xml = simplexml_load_string('
    <root>
        <sense>
            <gloss>there</gloss>
            <gloss>over there</gloss>
            <gloss>that place</gloss>
            <gloss>yonder</gloss>
        </sense>
        <sense>
            <gloss>that far</gloss>
            <gloss>that much</gloss>
            <gloss>that point</gloss>
        </sense>
    </root>
');

$senses = array();
foreach ($xml->sense as $sense) {
    $glosses = array();
    foreach ($sense->gloss as $gloss) {
        $glosses[] = (string) $gloss;
    }
    $senses[] = implode('|', $glosses);
}

$result = '<sense>'.implode('</sense><sense>', array_map('htmlspecialchars', $senses)).'</sense>';

Will return in $result:

<sense>there|over there|that place|yonder</sense><sense>that far|that much|that point</sense>
Joel Cox
  • 3,289
  • 1
  • 14
  • 31
0

Explode your string into two array, then do a look over them with Regular Expression:

$text = "<sense>
<gloss>there</gloss>
<gloss>over there</gloss>
<gloss>that place</gloss>
<gloss>yonder</gloss>
</sense>
<sense>
<gloss>that far</gloss>
<gloss>that much</gloss>
<gloss>that point</gloss>
</sense>";
$string = array();
array_walk((explode("<sense>", $text)), function($part) use (&$string)
{
    preg_match_all("@<gloss>(.*?)</gloss>@", $part, $match);
    count($match[1]) > 0 ? $string[] = implode("|", $match[1]) : null;
});
echo "<sense><gloss>".implode("&", $string)."</gloss></sense>";

Output:

<sense><gloss>there|over there|that place|yonder&that far|that much|that point</gloss></sense>
revo
  • 47,783
  • 14
  • 74
  • 117