You can do this, relatively easily even, using DOMDocument
to parse the markup, DOMXPath
to query for all the comment nodes, and then access each node's parent, extract the nodeValue
and list those values as "strings to translate":
$dom = new DOMDocument;
$dom->load($file);//or loadHTML in case you're working with HTML strings
$xpath = new DOMXPath($dom);//get XPath
$comments = $xpath->query('//comment()');//get all comment nodes
//this array will contain all to-translate texts
$toTranslate = array();
foreach ($comments as $comment)
{
if (trim($comment->nodeValue) == 'lang')
{//trim, avoid spaces, use stristr !== false if you need case-insensitive matching
$parent = $comment->parentNode;//get parent node
$toTranslate[] = $parent->textContent;//get parent node's text content
}
}
var_dump($toTranslate);
Note that this can't handle comments used in tag attributes. Using this simple script, you will be able to extract those strings that need to be translated in the "regular" markup. After that, you can write a script that looks for <!--lang-->
in tag attributes... I'll have a look if there isn't a way to do this using XPath, too. For now, this should help you to get started, though.
If you have not comments, other than <!--lang-->
in your markup, then you could simply use an xpath expression that selects the parents of those comment nodes directly:
$commentsAndInput = $xpath->query('(//input|//option)[@value]|//comment()/..');
foreach ($commentsAndInput as $node)
{
if ($node->tagName !== 'input' && $node->tagName !== 'option')
{//get the textContent of the node
$toTranslate[] = $node->textContent;
}
else
{//get value attribute's value:
$toTranslate[] = $node->getAttributeNode('value')->value;
}
}
The xpath expression explained:
//
: tells xpath to search for nodes that match the rest of the criteria anywhere in the DOM
input
: literal tag name: //input
looks for input tags anywhere in the DOM tree
[@value]
: the mentioned tag only matches if it has a @value
attribute
|
: OR. //a|//input[@type="button"]
matches links OR buttons
//option[@value]
: same as above: options with value attributes are matched
(//input|//option)
: groups both expressions, the [@value]
applies to all matches in this selection
//comment()
: selects comments anywhere in the dom
/..
: selects the parent of the current node, so //comment()/..
matches the parent, containing the selected comment node.
Keep working at the XPath expression to get all of the content you need to translate