0

My string looks like this

macd_at([{1036}].CLOSE,10,10,10).UPPER 

In this string I am trying to match this regex

([a-zA-Z][a-zA-Z0-9_]*_(at|AT)\((((\[\{[0-9]+\}\](\.(OPEN|CLOSE|LOW|HIGH))?)|[1-9][0-9]*\.?[0-9]*|(TRUE|FALSE)|\"[^"]*\"),)*((\[\{[0-9]+\}\](\.(OPEN|CLOSE|LOW|HIGH))?)|[1-9][0-9]*\.?[0-9]*|(TRUE|FALSE)|\"[^"]*\")\)(\.(VALUE|UPPER|LOWER|PRICE))?)

In online sites which check regex this is matched, but when I call std::regex_search it does not work. Is there some bug in VS C++ library?

When I change string

macd_at([{1036}],10,10,10).UPPER 

std::regex_search is working. Is there some limit how complicated regex can be.

PS: Regex building process was following (for easier looking to regex):

const std::string NUMBER_REGEX_PATERN = "[1-9][0-9]*\\.?[0-9]*";
const std::string OPERATOR_REGEX_PATERN = "(\\*|/|-|\\+)";
const std::string SYMBOL_REGEX_PATERN = "\\[\\{[0-9]+\\}\\]";
const std::string SYMBOL_SUFFIX_REGEX_PATERN = "(\\.(OPEN|CLOSE|LOW|HIGH))";
const std::string SYMBOL_WHOLE_REGEX_PATERN = "(" + SYMBOL_REGEX_PATERN + SYMBOL_SUFFIX_REGEX_PATERN + "?)";
const std::string STRING_REGEX_PATERN = "\\\"[^\"]*\\\"";
const std::string BOOLIAN_REGEX_PATERN = "(TRUE|FALSE)";
const std::string LITERAL_REGEX_PATERN = "(" + SYMBOL_WHOLE_REGEX_PATERN + "|" + NUMBER_REGEX_PATERN + "|" + BOOLIAN_REGEX_PATERN +"|" + STRING_REGEX_PATERN + ")";

const std::string STUDY_NAME_REGEX_PATERN = "[a-zA-Z][a-zA-Z0-9_]*_(at|AT)";
const std::string STUDY_SUFFIX_REGEX_PATERN = "(\\.(VALUE|UPPER|LOWER|PRICE))";
const std::string WHOLE_STUDY_REGEX_PATERN = STUDY_NAME_REGEX_PATERN + "\\((" +LITERAL_REGEX_PATERN + ",)*"+ LITERAL_REGEX_PATERN + "\\)";
const std::string WHOLE_STUDY_WITH_SUFIX_REGEX_PATERN = "(" + WHOLE_STUDY_REGEX_PATERN + STUDY_SUFFIX_REGEX_PATERN + "?)";
Vasoli
  • 77
  • 7
  • 1
    Maybe there is a problem with too much backtracking. In your second to last building block, try changing `(LITERAL,)*LITERAL` to `LITERAL(,LITERAL)*`. They both match the same, but the latter causes a lot less backtracking. The technique is called [unrolling-the-loop](http://stackoverflow.com/questions/17043454/using-regexes-how-to-efficiently-match-strings-between-double-quotes-with-embed) – Martin Ender Jul 08 '13 at 12:36
  • This solved my problem. But apparently VS C++ compiler had some problem with regexes. Tnx. – Vasoli Jul 08 '13 at 12:46
  • I'll make it an answer then. – Martin Ender Jul 08 '13 at 12:47

1 Answers1

1

Seeing the complexity of the pattern, excessive backtracking might be a problem. One point where you can reduce backtracking significantly is your second-to-last building block. Try changing

...(" +LITERAL_REGEX_PATERN + ",)*"+ LITERAL_REGEX_PATERN...

into

...LITERAL_REGEX_PATERN + "(" +LITERAL_REGEX_PATERN + ",)*"...

This is a simplified form of the unrolling-the-loop technique and reduces the amount of backtracking a lot. Note that both patterns match exactly the same string.

Another point to optimize:

If you don't need all the capturing groups (and I doubt you need them, because some of them get overwritten in the repetition), turn them into non-capturing groups. E.g.

(?:\\.(?:OPEN|CLOSE|LOW|HIGH))

Especially in conjunction with backtracking, unnecessary capturing can get quite expensive.

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130