1

I have two strings and i need to parse them as one with boost::regex. To accomplish this i need to glue my strings in some boost::string_ref like object, but no additional allocations are allowed.

In other words i need something like this.

const char s1[] = "abcd<ht";
const char s2[] = "ml>giuya";

boost::regex e("<[^>]*>");

//this is what i'm looking for
auto glued_string = make_glued_string(s1, sizeof(s1)-1,
                                      s2, sizeof(s2)-1); 

boost::regex_iterator<glue_string::iterator> 
    it(glued_string.begin(), glued_string.end(), e, 
    boost::match_default | boost::match_partial);

So the question is are there any suitable libraries or i have to implement this by myself? Thanks.

Alexander
  • 779
  • 8
  • 17
  • 1
    It seems you're trying to parse some sort of HTML-like syntax with regexes. Please refer to [this answer.](http://stackoverflow.com/a/1732454/2715511) – More Axes Jul 28 '14 at 17:27
  • Thanks, but this is just an example. I certanly need regex for my task. – Alexander Jul 28 '14 at 17:30
  • 1
    boost::range::join should work (although I have pointed to wrong version). see this question: http://stackoverflow.com/questions/14366576/boostrangejoin-for-multiple-ranges – firda Jul 28 '14 at 18:00
  • this seems to be what i need, thanks, i'll try it – Alexander Jul 28 '14 at 18:22

2 Answers2

2
#include <string>
#include <iostream>

#include <boost/range/adaptor/indexed.hpp>
#include <boost/range/join.hpp>
#include <boost/regex.hpp>

const char s1[] = "abcd<ht";
const char s2[] = "ml>giuya";

int main() {
    auto glued = boost::range::join(
        s1 | boost::adaptors::indexed(0),
        s2 | boost::adaptors::indexed(0));
    std::cout << "glued: ";
    for (auto c : glued)
        std::cout << c;
}
firda
  • 3,268
  • 17
  • 30
-1

Here's your answer - 100% efficient (ish).

To pre-empt early criticism, copying is almost always faster than a chain of references.

#include <iostream>
#include <string>
#include <type_traits>

using namespace std;

template<typename T1, int N1, typename T2, int N2>
string glue_string(T1 (&src1)[N1], T2 (&src2)[N2])
{
    string s(begin(src1), end(src1));
    s.insert(end(s), begin(src2), end(src2));
    return s;
}

int main()
{
    const char s1[] = "Hello, ";
    const char s2[] = "World";
    cout << glue_string(s1,s2) << endl; 

   return 0;
}

but aren't you sure you didn't really want this:

#include <iostream>
#include <iterator>

const char src[] = "Hello,"
" World";

using namespace std;

int main()
{
    auto first = begin(src);
    auto last = end(src);
    for( ; first != last ; ++first)
        cout.put(*first);
    return 0;
}

The string concatenation is done by the compiler - zero copies of strings, since c++11 begin() and end() work on c-style arrays if the compiler knows the length.

Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • I need to pass two strings as one to boost::regex.. And no additional allocations are allowed. Please, read carefully before answering. – Alexander Jul 28 '14 at 18:24
  • The "copying is almost always faster" claim seems ludicrous. It depends on (and only on) the usage patterns, since what chaining avoids - the allocations! - can easily dominate the runtime cost in many cases. – sehe Jul 28 '14 at 20:34
  • It's counterintuitive but not ludicrous. A solution that involves chaining iterator pairs requires storage for at least 2 pointers per string part (begin and end). That's 8 or 16 bytes depending on the architecture. So in the above example, the result of a boost::join would result in a structure that contains at least 4 pointers, plus an inter-string-iterator plus an intra-string iterator. i.e. at least 24 bytes. With SSO (short string optimisation) you'll find that the std::string solution is likely to be at least as efficient. – Richard Hodges Jul 29 '14 at 08:21