1

So I have such php function I want to translate to C++:

protected function htmlTag($content, $tag, $attrName, $attrValue, $valueName)
{
    preg_match_all("#<{$tag}[^>]*$attrName=['\"].*?$attrValue.*?['\"][^>]*$valueName=['\"](.+?)['\"][^>]*/?>#i", $content, $matches1);
    preg_match_all("#<{$tag}[^>]*$valueName=['\"](.+?)['\"][^>]*$attrName=['\"].*?$attrValue.*?['\"][^>]*/?>#i", $content, $matches2);

    $result = array_merge($matches1[1], $matches2[1]);
    return empty($result)?false:$result[0];
}

use example:

            $location = $this->htmlTag($content, 'meta', 'http-equiv', 'X-XRDS-Location', 'content');
            $server   = $this->htmlTag($content, 'link', 'rel', 'openid.server', 'href');
            $delegate = $this->htmlTag($content, 'link', 'rel', 'openid.delegate', 'href');

(content is result of $content= curl_exec($curl);)

preg_match_all - Searches subject for all matches to the regular expression given in pattern and puts them in matches in the order specified by flags. After the first match is found, the subsequent searches are continued on from end of the last match.

How to translate it using boost::regexp?

Rella
  • 65,003
  • 109
  • 363
  • 636
  • Would you care to explain for us in words what this function is actually supposed to do, or what it is used for? You may also want to take a look at this previous question: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – John Zwinck Jul 17 '11 at 16:04
  • I know that xml (html\xhtml) parsing via regexp is bad. But I really need it to work as it was in php... – Rella Jul 17 '11 at 16:24

1 Answers1

1

Something like this:

boost::optional<std::string> htmlTag(const std::string& content,
    const std::string& tag, const std::string& attrName,
    const std::string& attrValue, const std::string& valueName)
{
    const std::string
        expr1 = boost::format("#<%1[^>]*%2=['\"].*?%3.*?['\"][^>]"
                "*%4=['\"](.+?)['\"][^>]*/?>#i")
                % tag % attrName % attrValue % valueName,
        expr2 = boost::format("#<%1[^>]*%2=['\"](.+?)['\"][^>]*"
                "%3=['\"].*?%4.*?['\"][^>]*/?>#i")
                % tag % attrName % attrValue % valueName;

    boost::match_results<std::string::const_iterator>
        matches1, matches2, result;

    // do the searches (note: these probably need to be loops as shown at the bottom of this page:
    // http://www.boost.org/doc/libs/1_47_0/libs/regex/doc/html/boost_regex/ref/regex_search.html
    if (!regex_search(content, matches1, expr1))
        return boost::none;
    if (!regex_search(content, matches2, expr2))
        return boost::none;

    result = // merge matches1[1] and matches2[1] somehow
    if (result.empty())
        return boost::none;
    else
        return result[0];
}

I'm sure I've gotten some details wrong (for one thing, I think you need to call regex_search over and over as per the comment), but hopefully you can work out those details and post your finished solution.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • As far as I can see [array_merge](http://php.net/manual/ru/function.array-merge.php) is like (replace if exists || join)... I`ll post my results of porting lightopenid on some google code project later. – Rella Jul 17 '11 at 18:06
  • You might be able to emulate that with std::set_union (matches2 would go before matches1, so that its values are given preference. – John Zwinck Jul 17 '11 at 18:09