Matching on varying number of lines with C++ std::regex_replace

Question

I can extract the four line string with this fragment using C++ std::regex:

  std::regex table("(<table id.*\n.*\n.*\n.*>)");
  const std::string format="$&";
  std::cout <<
     std::regex_replace(tidy_string(/* */)
        ,table
        ,format
        ,std::regex_constants::format_no_copy
        |std::regex_constants::format_first_only
        )
     << '\n';

tidy_string() returns a std::string and code produces this output:

<table id="creditPolicyTable" class=
                              "table table-striped table-condensed datatable top-bold-border bottom-border"
                              summary=
                              "This table of Credit Policy gives credit information (column headings) for list of exams (row headings).">

How do I match on text that has a varying number of lines rather than exactly four? For example:

<table id="creditPolicyTable" summary=
                              "This table of Credit Policy gives credit information (column headings) for list of exams (row headings).">

or:

<table id="creditPolicyTable"
    class="table table-striped table-condensed datatable top-bold-border bottom-border"
   summary="This table of Credit Policy gives credit information (column headings) for list of exams (row headings)."
 more="x"
 even_more="y">

You could possible just use `(]*?>)`. This would match everything until the first `>` and therefore give you the content of your `
` tab (assuming there are no escaped `>` characters inside). In general I think using regex to parse XML/HTML is not the best approach, have you considered using an XML parser instead (e.g. libxml2)? — ThePhysicist, Aug 21 '17 at 11:45
BTW the `.*` operators that you use above are "greedy", i.e. they try to match as many characters as possible. This could be a problem if you had a very long file with many "" tags inside. — ThePhysicist, Aug 21 '17 at 11:53
i feel obliged to link to this great SO answer, and hope you find an alternate method of parsing xml data. https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Haleemur Ali, Aug 21 '17 at 13:41

score 0 · Answer 1 · answered Aug 21 '17 at 13:26

You should use std::regex_search and lazily search for anything but the '>' character. Like this:

#include <iostream>
#include <regex>

int main() {
  std::string lines[] = {"<table id=\"creditPolicyTable\" class=\"\
table table-striped -table-condensed datatable top-bold-border bottom-border\"\
summary=\
\"This table of Credit Policy gives credit information (column headings) for list of exams (row headings).\">",
               "<table id=\"creditPolicyTable\" summary=\
               \"This table of Credit Policy gives credit information (column headings) for list of exams (row headings).\"\
               more=\"x\"\
               even_more=\"y\">"};
  std::string result;
  std::smatch table_match;

  std::regex table_regex("<table\\sid=[^>]+?>");

  for (const auto& line : lines){
    if (std::regex_search(line, table_match, table_regex)) {
      for (size_t i = 0; i < table_match.size(); ++i)
        std::cout << "Match found " << table_match[i] << '\n';
    }
  }
}

Matching on varying number of lines with C++ std::regex_replace

1 Answers1