4

I'm trying to match the two characters after a specific character. The trailing values may contain the specified character, which is ok, but I also need to capture that specified character as the beginning of the next capture group.

This code should illustrate what I mean:

extern crate regex;
use regex::Regex;


pub fn main() {
    let re = Regex::new("(a..)").unwrap();
    let st = String::from("aba34jf baacdaab");
    println!("String to match: {}", st);

    for cap in re.captures_iter(&st) {
        println!("{}", cap[1].to_string());
        // Prints "aba" and "aac",
        // Should print "aba", "a34", "aac", "acd", "aab"
    }
}

How do I get overlapping captures without using look around (which the regex crate doesn't support in Rust)? Is there something similar to what is in Python (as mentioned here) but in Rust?

Edit:

Using onig as BurntSushi5 suggested, we get the following:

extern crate onig;
use onig::*;

pub fn main() {
    let re = Regex::new("(?=(a.{2}))").unwrap();
    let st = String::from("aba34jf baacdaab");
    println!("String to match: {}", st);

    for ch in re.find_iter(&st) {
        print!("{} ", &st[ch.0..=ch.1+2]);
        // aba a34 aac acd aab, as it should.
        // but we have to know how long the capture is.
    }
    println!("");
}

Now the problem with this is that you have to know how long the regex is, because the look ahead group doesn't capture. Is there a way to get the look ahead regex captured without knowing the length beforehand? How would we print it out if we had something like (?=(a.+)) as the regex?

Major
  • 544
  • 4
  • 19
  • The regex crate says look around isn't supported, so I didn't even try it. I get the error: `error: look-around, including look-ahead and look-behind, is not supported`. – Major Aug 15 '19 at 13:43

2 Answers2

4

You can't. Your only recourse is to either find a different approach entirely, or use a different regex engine that supports look-around like onig or pcre2.

BurntSushi5
  • 13,917
  • 7
  • 52
  • 45
  • It's not possible to get the offset of the first match, then start the next search one codepoint in, repeating until no more matches? – Shepmaster Aug 14 '19 at 15:06
  • Yeah I guess that to me falls under "different approach." I don't think it will work in all cases. When I get a chance I can update my answer to include some code for that. – BurntSushi5 Aug 14 '19 at 19:19
  • I used the onig binding and the look-ahead method works now. However, this particular method requires knowing how long the capture will be because we can't capture the part inside the look-ahead part. Do you know if there's an option to turn that on? I can't find documentation for that at all. – Major Aug 16 '19 at 17:29
1

I found a solution, unfortunately not regex though:

pub fn main() {
    print_char_matches ("aba34jf baacdaab", 'a', 2);
    //aba a34 aac acd aab, as it should.
}

pub fn print_char_matches( st:&str, char_match:char, match_length:usize ) {
    let chars:Vec<_> = st.char_indices().collect();

    println!("String to match: {}", st);

    for i in 0..chars.len()-match_length {
        if chars[i].1 == char_match {
            for j in 0..=match_length {
                print!("{}", chars[i+j].1);
            }
            print!(" ");
        }
    }
    println!("");
}

This is a bit more generalizable, ASCII only. Matches the character provided and the specified number of digits after the match.

Major
  • 544
  • 4
  • 19