0

Really, I don't get appropriate title for this question. Just some my weird question from my research. This is an example :

XML text :

The <tag1>quick brown fox</tag1> <tag2>jumps over</tag2> the lazy <tag1>dog</tag1>

Total words (the text inside tag is counted as one word) : 6

So if my question is:

How the position of <tag1> in the text? The answer is 2 and 6

How the position of <tag2> in the text? The answer is 3

How the position of the word "lazy" in the text? The answer is 5

Does anyone have any idea? I didn't got a clue with this.

Community
  • 1
  • 1
andrefadila
  • 647
  • 2
  • 9
  • 36

1 Answers1

1

Does anyone have any idea? I didn't got a clue with this.

You load the XML text as XML into an XML parser, for example as part of the document element / root element. Then you iterate over all child nodes of that element and decide:

  • Per each element, you count +1
  • Per each text, you + by the count the words in that text (see other Q&A material how you can count the words of text)

When you've finished the iteration, you've got the word count.

Example code:

<?php
/**
 * Count Words on XML Text Using PHP
 * @link https://stackoverflow.com/a/17670772/367456
 */

$xmlText = <<<BUFFER
The <tag1>quick brown fox</tag1> <tag2>jumps over</tag2> 
  the lazy <tag1>dog</tag1>
BUFFER;

$doc    = new DOMDocument();
$result = $doc->loadXML(sprintf('<root>%s</root>', $xmlText));
if (!$result) {
    throw new Exception('Invalid XML text given.');
}

/**
 * replace this function with your own implementation that works
 * for all your UTF-8 strings, this is just a quick example mock.
 */
function utf8_count_words($string) {
    return (int)str_word_count($string);
}

$wordCount = 0;
foreach ($doc->documentElement->childNodes as $node) {
    switch ($node->nodeType) {
        case XML_ELEMENT_NODE:
            $wordCount++;
            break;
        case XML_TEXT_NODE:
            $wordCount += utf8_count_words($node->data);
            break;
        default:
            throw new Exception(
                sprintf('Unexpected nodeType in XML-text: %d', $node->nodeType)
            );
    }
}

printf("Result: %d words.\n", $wordCount);

Example output (Demo):

Result: 6 words.
Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836
  • I got inspired with this, thanks. I'm just add a little bit of code to get answer of my question. Once again, thanks – andrefadila Jul 17 '13 at 04:08