1

I am a using a boost regex on a boost circular buffer and would like to "remember" positions where matches occur, what's the best way to do this? I tried the code below, but "end" seems to store the same values all the time! When I try to traverse from a previous "end" to the most recent "end" for example, it doesn't work!

  boost::circular_buffer<char> cb(2048);
  typedef boost::circular_buffer<char>::iterator  ccb_iterator;
  boost::circular_buffer<ccb_iterator> cbi(4); 

  //just fill the whole cbi with cb.begin()  
  cbi.push_back(cb.begin());
  cbi.pushback(cb.begin());
  cbi.pushback(cb.begin());
  cbi.pushback(cb.begin());


 typedef regex_iterator<circular_buffer<char>::iterator> circular_regex_iterator;

 while (1)
{
  //insert new data in circular buffer (omitted)
  //basically reads data from file and pushes it back to cb

  boost::circular_buffer<char>::iterator    start,end;  

 circular_regex_iterator regexItr(
        cb.begin(), 
        cb.end() , 
         re, //expression of the regular expression
         boost::match_default | boost::match_partial); 
    circular_regex_iterator last;

    while(regexItr != last)
    {

            if((*regexItr)[0].matched == false)
           {
               //partial match      
               break;
            }
        else
        {
           // full match:
           start = (*regexItr)[0].first;
           end = (*regexItr)[0].second; 

             //I want to store these "end" positions to to use later so that I can 
             //traverse the buffer between these positions (matches).  

            //cbi stores positions of these matches, but this does not seem to work!                 
             cbi.push_back(end);    

            //for example, cbi[2] --> cbi[3] traversal works only first time this 
            //loop is run!
        }

        ++regexItr;
    }

}

Abryan
  • 27
  • 8
  • What is `circular_regex_iterator`? Can you link some references? – Kerrek SB Aug 15 '11 at 11:55
  • Edited above... it is typedef as follows: typedef regex_iterator::iterator> circular_regex_iterator; – Abryan Aug 15 '11 at 11:57
  • Hm, I'm trying to see through this, but all this is new to me. Can we make some simplifications? Is it relevant that you have a circular buffer, or can we just treat it as some generic range of characters? – Kerrek SB Aug 15 '11 at 12:24
  • Yes, we can assume some generic range of characters... the reason of circular buffer is data comes through in as a stream and I want to be working only with the newest 2K worth of data... – Abryan Aug 15 '11 at 12:36
  • Why are you storing the `first`/`second` iterators in another circular buffer? There could be arbitrarily many of those, why not put those into a linear container, preferably a container of pairs? – Kerrek SB Aug 15 '11 at 12:37
  • Is your regex actually returning different iterators? – Jason Aug 15 '11 at 13:00
  • I am storing "end" only! so that i know that [cbi[2] -- cbi[3]) holds data between two recent matches, [cbi[1]--cbi[2] thy matches before that and so on) – Abryan Aug 15 '11 at 13:13
  • Jason, no it does not return different iterators, which is the problem! – Abryan Aug 15 '11 at 13:30

1 Answers1

0

This isn't quite as much an answer as an attempt to reconstruct what you're doing. I'm making a simple circular buffer initialized from a string, and I traverse regex matches through that buffer and print the matched ranges. All seems to work fine.

I would not recommend storing the ranges themselves in a circular buffer; or at the very least the ranges should be stored in pairs.

Here's my test code:

#include <iostream>
#include <string>
#include <boost/circular_buffer.hpp>
#include <boost/regex.hpp>
#include "prettyprint.hpp"

typedef boost::circular_buffer<char> cb_char;
typedef boost::regex_iterator<cb_char::iterator> cb_char_regex_it;

int main()
{
  std::string sample = "Hello 12 Worlds 34 ! 56";
  cb_char cbc(8, sample.begin(), sample.end());

  std::cout << cbc << std::endl;    // (*)

  boost::regex expression("\\d+");  // just match numbers

  for (cb_char_regex_it m2, m1(cbc.begin(), cbc.end(), expression); m1 != m2; ++m1)
  {
    const auto & mr = *m1;
    std::cout << "--> " << mr << ", range ["
              << std::distance(cbc.begin(), mr[0].first) << ", "
              << std::distance(cbc.begin(), mr[0].second) << "]" << std::endl;
  }
}

(This uses the pretty printer to print the raw circular buffer; you can remove the line marked (*).)


Update: Here's a possible way to store the matches:

typedef std::pair<std::size_t, std::size_t> match_range;
typedef std::vector<match_range>            match_ranges;

/* ... as before ... */

  match_ranges ranges;

  for (cb_char_regex_it m2, m1(cbc.begin(), cbc.end(), expression); m1 != m2; ++m1)
  {
    const auto & mr = *m1;

    ranges.push_back(match_range(std::distance(cbc.begin(), mr[0].first), std::distance(cbc.begin(), mr[0].second)));

    std::cout << "--> " << mr << ", range " << ranges.back() << std::endl;
  }

  std::cout << "All matching ranges: " << ranges << std::endl;
Community
  • 1
  • 1
Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • This is great, what I am trying to do if we use your example is keep track of the ranges [cbc.begin() ... mr[0].second] as more data is inserted into the cb... I need the last 3 ranges! I am aware that my oldest range might have been overwritten but given that matches happen quite frequently in a 2K buffer, this should not be an issue – Abryan Aug 15 '11 at 13:28
  • Abryan: Are you modifying the original circular buffer *while* you are matching the regex? I think you definitely have to keep your hands off the buffer if you want anything to make sense. That's why I asked whether the buffer might as well just be a generic range. – Kerrek SB Aug 15 '11 at 13:40
  • No, I dont! The buffer is not modified _while_ matching the regex! But it will be modified next time we read again from the file in the _while_ above in my code. That shouldn't be a problem right? Also, why is it that storing iterators to cb a bad idea? – Abryan Aug 15 '11 at 13:58
  • @Abryan: Storing iterators should be OK. They're random-access, so it's essentially the same as storing numerical indices. I just wouldn't store the iterator in a circular buffer. Just beware that iterators will be *invalid* after you modify the CB, while numerical offsets will be *meaningless*. Let me add some code, though. – Kerrek SB Aug 15 '11 at 14:00
  • Ok, used pairs of iterators as you suggested and it works fine for storing the last match (beginning of the buffer --> end of first match). I want to keep track of these ranges as I read in more data storing the last three of them... so it will look something like this (1) [a1 ... a2] (2) [a2...a3] (3) [a3...a4] where a4 is the last match in the circular buffer, a2 the match before that, and a1 the match before a2. – Abryan Aug 15 '11 at 14:16
  • But if you read more data into the CB, all your previous range data becomes meaningless. What do you want to do about that? It might be better to just *copy* the match into some other structure. – Kerrek SB Aug 15 '11 at 14:31
  • Yes that's true if I overwrite my range data. The reality is that consecutive matches happen always within just couple of bytes... I might overwrite match that happened a very long time ago but that's fine. I am only after that last 3 ranges! – Abryan Aug 15 '11 at 14:44