1

I'm trying make a visual representation of xml code in HTML.

A simple case is this:

Original source:

<g id="1">some content</g> other content <s/>

Desired output:

<span data-id="1">&lt;g id=&quot;1&quot;&gt;</span>some content<span data-closingof="1">&lt;/g&gt;</span> other content <span>&lt;s/&gt;</span>

I tried much with regex having great results, but in case of nested elements it fails.

Is there any other way? (for example some XML parser that will allow such transformations)

Thank you.

albertos
  • 51
  • 5
  • Escaping XML in JavaScript has been discussed in [this question](https://stackoverflow.com/questions/7918868/how-to-escape-xml-entities-in-javascript). In PHP you could have luck with [htmlspecialchars](http://php.net/manual/en/function.htmlspecialchars.php). – Sami Hult Jan 25 '19 at 16:16
  • I altered a bit the above code, and simple escaping is not really what I want. I'm kinda hoping to create matching html elements (eg. with data attribs). Thats why I'm asking for something more than regex or escaping. – albertos Jan 25 '19 at 16:33
  • 1
    It may be an overkill, but have you considered a SAX parser like [this](http://php.net/manual/en/book.xml.php)? – Sami Hult Jan 25 '19 at 16:46
  • I found that XML Parser has "xml_set_element_handler" which is something I was hoping for. I will give it a try. Thank you for your suggestion. – albertos Jan 25 '19 at 16:54
  • It should be a solid approach, if a bit heavy :) – Sami Hult Jan 25 '19 at 16:55

3 Answers3

1

Unusual for me to suggest regex for XML processing, but this may be more appropriate.

$input = '<g id="1">some content</g> other content <s/>';
echo preg_replace_callback("/(<.*?>)/", function($tag) {
        return "<span>".htmlentities($tag[1])."</span>";
    },
    $input);

This will look for any content in < and > and encode it - whilst enclosing it in <span> tags.

Outputs...

<span>&lt;g id=&quot;1&quot;&gt;</span>some content<span>&lt;/g&gt;</span> other content <span>&lt;s/&gt;</span>

As this is only a limited example, this may not fit all sizes, but may be worth a go.

Update:

With the update for adding the data-id I've updated the code, it keeps a stack of the levels of tags and adds in when a matching close tag is found (although it doesn't check the type of tag), it will ignore and self closed tags as these don't have any other content.

$input = '<g id="1">some <g>2</g>content</g> other content <s/>';
$tagID = [];
echo preg_replace_callback("/(<.*?>)/", function($tag) use (&$tagID) {
    if ( substr($tag[1], -2) == "/>" ) {
        $out = "<span>".htmlentities($tag[1])."</span>";
    }
    else {
        $add = "";
        if ( substr($tag[1],0,2) == "</" )    {
            $id = array_pop($tagID);
            if ( !empty($id) )  {
                $add = ' data-closingof="'.$id.'"';
            }
        }
        else    {
            if (preg_match('/id="(.*?)"/', $tag[1], $match)) {
                $id = $match[1];
                $add = ' data-id="'.$id.'"';
            }
            else {
                $id = "";
            }
            array_push($tagID, $id);
        }
        $out = "<span{$add}>".htmlentities($tag[1])."</span>";
    }
    return $out;
},
$input);
Nigel Ren
  • 56,122
  • 11
  • 43
  • 55
1

I ended up with something like this.

class TagsConverter {
    private $parser;

    private $nestedIDs = [];
    private $output = '';

    function __construct() {
        $this->parser = xml_parser_create();
        xml_set_object($this->parser, $this);
        xml_parser_set_option($this->parser, XML_OPTION_CASE_FOLDING, false);
        xml_set_element_handler($this->parser, "tagOpen", "tagClose");
        xml_set_character_data_handler($this->parser, "tagData");
    }

    function __destruct() {
        xml_parser_free($this->parser);
        unset($this->parser);
    }

    function reset() {
        $this->output = '';
        $this->nestedAttribs = [];
    }

    function transform($xml) {
        $xml = '<root>' . $xml . '</root>';
        xml_parse($this->parser, $xml, true);

        $finalOutput = $this->output;

        $this->reset();

        return $finalOutput;
    }

    function tagOpen($parser, $tag, $attributes) {
        if (isset($attributes["id"]))
            $this->nestedIDs[] = $attributes["id"];

        switch($tag) {
            case "bx":
                $this->output .= '<span>' . htmlentities('<bx />') . "</span>";
                break;
            case "g":
                $id = $attributes["id"];
                $this->output .= '<span data-id="' . $id .'">' . htmlentities('<g id="'.$id.'">') . "</span>";
                break;
            default:
                break;
        }
    }

    function tagData($parser, $cdata) {
        $this->output .= $cdata;
    }

    function tagClose($parser, $tag) {
        switch($tag) {
            case "g":
                $id = array_pop($this->nestedIDs);
                $this->output .= '<span data-closingof="' . $id .'">' . htmlentities('</g>') . "</span>";
                break;
            default:
                break;
        }
    }
}

Example run:

$p = new TagsConverter();
echo $p->transform('<g id="1">test g <g id="2">222</g></g> <g id="3">3333</g> other <x/> content <g id="4">444</g> <bx/>');

<span data-id="1">&lt;g id=&quot;1&quot;&gt;</span>test g <span data-id="2">&lt;g id=&quot;2&quot;&gt;</span>222<span data-closingof="2">&lt;/g&gt;</span><span data-closingof="1">&lt;/g&gt;</span> <span data-id="3">&lt;g id=&quot;3&quot;&gt;</span>3333<span data-closingof="3">&lt;/g&gt;</span> other  content <span data-id="4">&lt;g id=&quot;4&quot;&gt;</span>444<span data-closingof="4">&lt;/g&gt;</span> <span>&lt;bx /&gt;</span>

I wonder if there is a way to do in JS.

albertos
  • 51
  • 5
  • Just comparing the output of my solution to this, the only difference seems to be that I've included the `` tag as well. – Nigel Ren Jan 26 '19 at 13:29
  • the final code is far more complicated. It also add classes, any arguments may found in each tag and has a more complex html output. With regex it was working but shortly became really complicated to extend. – albertos Jan 26 '19 at 16:11
  • So almost any answer would have not been up to the job as there are lots of other things not in the question which were needed to answer this. I would not normally suggest regex for XML, but as this isn't 'normal' XML it may have fit (especially being able to move it to Javascript). – Nigel Ren Jan 26 '19 at 16:17
0

you can use this, but I dont know how your xml file looks like so I cant give you an example with your code.

this will make the xml in an array so you can easily get it out

    $getXml = file_get_contents('xml.xml');
    $xml = simplexml_load_string($getXml) or die("Error: Cannot create object");

this will loop trough the array

    foreach($xml->channel->item as $item) {
        //here you could do something like this
        echo "<h1>";
        print_r($item->title);
        echo "</h1>";
        echo "<br>";
        }

you can also do this, this will print all elements from the xml

     print_r($xml);

documentation about simplexml_load_string: https://www.w3schools.com/php/func_simplexml_load_string.asp

if you dont understand plz comment

TimeParadox
  • 197
  • 1
  • 2
  • 16
  • 1
    Does this transform XML tags to visible HTML text? – Sami Hult Jan 25 '19 at 16:17
  • *I dont know how your xml file looks like* - isn't that the *Original source:*? – Nigel Ren Jan 25 '19 at 16:20
  • XML is like the one posted. To be really specific, its an XLIFF document. But I'm only interested in certain tags (the ones allowed in mrk tags). – albertos Jan 25 '19 at 16:21
  • so does your xml file have something like some content other content etc? – TimeParadox Jan 25 '19 at 16:32
  • yes its wrapped in many tags but this is the part i want to present – albertos Jan 25 '19 at 16:44
  • okey so what I meant with I need your file is the wrapped tags around it because when you parse it to an array it will look something like this array[list][item]([some content][other content ]) so if you want your part you can do this $xml->list->item[0] and than it will display the first and you can set html tags around it so it can be displayed in html as you want – TimeParadox Jan 25 '19 at 16:48