15

I have a string that is separated by a delimiter. I want to split this string using regex and keep the delimiters.

My current code is:

use regex::Regex; // 1.1.8

fn main() {
    let seperator = Regex::new(r"([ ,.]+)").expect("Invalid regex");
    let splits: Vec<_> = seperator.split("this... is a, test").into_iter().collect();
    for split in splits {
        println!("\"{}\"", split);
    }
}

The output of which is:

"this"
"is"
"a"
"test"

I would like to keep the separators (in this case the space characters), the output I would like to see is:

"this"
"... "
"is"
" "
"a"
", "
"test"

How can I, if at all possible, achieve such behavior with regex?

This is different from Split a string keeping the separators, which uses the standard library and not the regex crate.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Ian Rehwinkel
  • 2,486
  • 5
  • 22
  • 56
  • 1
    why do you want keep them you know they are space. – Stargateur Jul 07 '19 at 11:24
  • 1
    The space is just an example. I will actually be matching other characters/sequences. I'll edit the question to clarify. – Ian Rehwinkel Jul 07 '19 at 11:26
  • 3
    The duplicate isn't an exact match, since it does not use the `regex` crate. I believe in this case your best option is to use `find_iter()` to either find all separators and their start and stop indices, or to extend the regex to [match either a separator or or the text between the separators](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=cac108acfeb2a61a6c2293c47e8082c6). (I would add this as an answer, but it doesn't really fit for the dupe, so I'll leave it as comment.) – Sven Marnach Jul 07 '19 at 11:43
  • 1
    @SvenMarnach in the future, feel free to edit the question to make it a distinctly clear duplicate. You can also @-mention me directly if I did the closing / editing. It's reopened now. – Shepmaster Jul 07 '19 at 15:39

1 Answers1

10

As documented on the Regex type:

Using the std::str::pattern methods with Regex

Note: This section requires that this crate is compiled with the pattern Cargo feature enabled, which requires nightly Rust.

Since Regex implements Pattern, you can use regexes with methods defined on &str. For example, is_match, find, find_iter and split can be replaced with str::contains, str::find, str::match_indices and str::split.

Using the pattern feature, you can use the techniques described in Split a string keeping the separators:

use regex::Regex; // 1.1.8

fn split_keep<'a>(r: &Regex, text: &'a str) -> Vec<&'a str> {
    let mut result = Vec::new();
    let mut last = 0;
    for (index, matched) in text.match_indices(r) {
        if last != index {
            result.push(&text[last..index]);
        }
        result.push(matched);
        last = index + matched.len();
    }
    if last < text.len() {
        result.push(&text[last..]);
    }
    result
}

fn main() {
    let seperator = Regex::new(r"([ ,.]+)").expect("Invalid regex");
    let splits = split_keep(&seperator, "this... is a, test");
    for split in splits {
        println!("\"{}\"", split);
    }
}

This also gives you a hint on how to transform the code to not require nightly Rust:

For example, [...] find_iter [...] can be replaced with [...] str::match_indices

Apply the reverse transformation to use stable Regex methods.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • 1
    i'm getting `error[E0277]: expected a std::ops::Fn<(char,)> closure, found regex::re_unicode::Regex` in `split_keep` at `text.match_indices(r)` in rust 1.42.0-nightly and 1.40 – stuart Feb 05 '20 at 21:31
  • 1
    @stuart to verify, did you follow the numerous warnings at the top of the answer about enabling the `pattern` feature? – Shepmaster Feb 05 '20 at 21:34
  • ah, it works with this in `Cargo.toml` despite the flood of vscode warnings it triggers `regex = { version = "1.1.8", features = ["pattern"] }` – stuart Feb 06 '20 at 01:57