8

I'm on Visual Studio 2013 and I'm seeing what I think is a bug, I was hoping someone could confirm?

string foo{ "A\nB\rC\n\r" };
vector<string> bar;

for (sregex_iterator i(foo.cbegin(), foo.cend(), regex("(.*)[\n\r]{1,2}")); i != sregex_iterator(); ++i){
    bar.push_back(i->operator[](1).str());
}

This code hits a Debug Assertion in the Visual Studio regex library:

regex_iterator orphaned

If I define the regex outside the for-loop it's fine:

string foo{ "A\nB\rC\n\r" };
vector<string> bar;
regex bug("(.*)[\n\r]{1,2}");

for (sregex_iterator i(foo.cbegin(), foo.cend(), bug); i != sregex_iterator(); ++i){
    bar.push_back(i->operator[](1).str());
}

Alternatively this works fine in a transform as shown in this question:

string foo{ "A\nB\rC\n\r" };
vector<string> bar;

// This puts {"A", "B", "C"} into bar
transform(sregex_iterator(foo.cbegin(), foo.cend(), regex("(.*)[\n\r]{1,2}")), sregex_iterator(), back_inserter(bar), [](const smatch& i){ return i[1].str(); });

Can someone confirm this is a bug?

Community
  • 1
  • 1
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288

2 Answers2

11

In C++11 you are allowed to bind a temporary regex to const regex & which can lead to undefined behavior if the iterator is used outside of the lifetime of the temporary since it will store a pointer to it. This is a defect in the specification and it is not an error, although Visual Studio catches this with a debug assert.

sregex_iterator i(foo.cbegin(), foo.cend(), regex("(.*)[\n\r]{1,2}"))
                                            ^^^^^
                                            temporary

The following deleted overload was adding in C++14 to prevent this case, from cppreference:

regex_iterator(BidirIt, BidirIt,
           const regex_type&&,
           std::regex_constants::match_flag_type =
           std::regex_constants::match_default) = delete;       (since C++14)

and it says:

The overload 2 is not allowed to be called with a temporary regex, since the returned iterator would be immediately invalidated.

So this is not a Visual Studio bug since it is implementing the C++11 standard and this was not addressed via a defect report till later on. Both clang and gcc using -std=c++14 or greater will produce an error with your first(see it live) and third(see it live) example. Visual Studio only started supporting some C++14 in VS 2015:

[...]and initial support for certain C++14 features.[...]

We can see that LWG defect 2332: regex_iterator/regex_token_iterator should forbid temporary regexes deals with this:

Users can write "for(sregex_iterator i(s.begin(), s.end(), regex("meow")), end; i != end; ++i)", binding a temporary regex to const regex& and storing a pointer to it. This will compile silently, triggering undefined behavior at runtime. We now have the technology to prevent this from compiling, like how reference_wrapper refuses to bind to temporaries.

As T.C. points out the last example you show is actually ok, even though you are binding a temporary its lifetime extends to the end of the expression.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • So what you're saying is the fact it compiled is a bug? And the fact that I could do the `transform` statement is a bug? – Jonathan Mee Apr 27 '15 at 12:36
  • 3
    @JonathanMee: the fact that it compiled means VS 2013 mostly tries to implement C++11, not C++14. The fact that the `transform` worked isn't really a bug. You used a temporary after it was destroyed. That's undefined behavior, so the compiler can do *anything* and still conform with the standard. – Jerry Coffin Apr 27 '15 at 12:55
  • @JerryCoffin Interesting I guess that's the problem with not totally supporting any one standard. I had certainly assumed that the move constructor would hang onto the `regex`. – Jonathan Mee Apr 27 '15 at 13:04
  • When you say: "The behavior for cases 1 and 3" You're talking about the `regex_iterator`s that were constructed by moving a `regex` right? I'm still not following why I didn't get a compile time error since these were `delete`d unless @JerryCoffin 's suggestion is the problem. If that's the case, maybe you could add a note in your answer? – Jonathan Mee Apr 27 '15 at 13:42
  • @JonathanMee updated, I realized, it was not clear on several points. – Shafik Yaghmour Apr 27 '15 at 14:03
  • 2
    @JonathanMee - Note that LWG2332 was reported by Microsoft when the problem was discovered. And that was *after* the release of VS2013. What can you do? – Bo Persson Apr 27 '15 at 14:12
  • @BoPersson How do we know that the bug was submitted by Microsoft? I mean it doesn't seem likely to have been submitted by gcc given their slowness to implement `regex` stuff, but still... – Jonathan Mee Apr 27 '15 at 14:15
  • 2
    @JonathanMee STL(Stephan T. Lavavej) works at MS and is very involved on the committee – Shafik Yaghmour Apr 27 '15 at 14:16
  • @ShafikYaghmour STL in this case referring to "Stephan T. Lavavej" I'm sure... or you were making a funny joke. I feel like you guys are so much cooler than me... you're on a first name basis with the guys building compilers! – Jonathan Mee Apr 27 '15 at 14:20
  • 1
    @JonathanMee I am just a geek and I follow C++ closely, he is well known as STL and that is his id on reddit as well and he is active on C++ threads there. – Shafik Yaghmour Apr 27 '15 at 14:22
  • @JerryCoffin The `transform` version is fine (somewhat by happenstance); the temporary regex isn't destroyed until the end of the full expression, i.e., the `;`, after the `transform` call is complete. – T.C. Apr 27 '15 at 16:10
  • @T.C.: Ah, good point--if I'd been thinking, I'd have realized that. – Jerry Coffin Apr 27 '15 at 16:30
2

No, this is not a bug. See LWG 2329 regex_match()/regex_search() with match_results should forbid temporary strings. This construct exhibits undefined behavior since it binds a temporary regex to const regex& and stores a pointer to it.

Also see C++14 STL Features, Fixes, And Breaking Changes In Visual Studio 14 CTP1 where this is listed as a fix.

Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
Marius Bancila
  • 16,053
  • 9
  • 49
  • 91
  • Note that LWG 2329 is about temporary `string`s *not* temporary `regex`s. What you're looking for is [LWG 2332](http://cplusplus.github.io/LWG/lwg-defects.html#2332). If Visual Studio's implementation of LWG 2332 realy deleted the `move` constructor for the `regex` it should have failed to compile. – Jonathan Mee Apr 27 '15 at 13:14