20

I found this example for substring replacement:

use std::str;
let string = "orange";
let new_string = str::replace(string, "or", "str");

If I want to run a number of consecutive replacements on the same string, for sanitization purposes, how can I do that without allocating a new variable for each replacement?

If you were to write idiomatic Rust, how would you write multiple chained substring replacements?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
mkaito
  • 1,005
  • 1
  • 10
  • 19

5 Answers5

13

The regex engine can be used to do a single pass with multiple replacements of the string, though I would be surprised if this is actually more performant:

extern crate regex;

use regex::{Captures, Regex};

fn main() {
    let re = Regex::new("(or|e)").unwrap();
    let string = "orange";
    let result = re.replace_all(string, |cap: &Captures| {
        match &cap[0] {
            "or" => "str",
            "e" => "er",
            _ => panic!("We should never get here"),
        }.to_string()
    });
    println!("{}", result);
}
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
wingedsubmariner
  • 13,350
  • 1
  • 27
  • 52
  • 1
    As of version 1.0.5 of the regex crate, the `.to_string()` in `match &cap[0] { ... }.to_string()` is not needed, since the `Regex::replace_all` closure can return any `T: AsRef`, not just `String`. So it can return string literals (`&'static str`) directly. – Arnavion Sep 07 '18 at 22:42
  • Suppose I have a capture group with special characters and placeholders, how would I go about that in the match directive? I don't know the match in the beginning, so spelling out the whole possible match won't work. – bellackn Jul 27 '22 at 14:59
10

I would not use regex or .replace().replace().replace() or .maybe_replace().maybe_replace().maybe_replace() for this. They all have big flaws.

  • Regex is probably the most reasonable option but regexes are just a terrible terrible idea if you can at all avoid them. If your patterns come from user input then you're going to have to deal with escaping them which is a security nightmare.
  • .replace().replace().replace() is terrible for obvious reasons.
  • .maybe_replace().maybe_replace().maybe_replace() is only very slightly better than that, because it only improves efficiency when a pattern doesn't match. It doesn't avoid the repeated allocations if they all match, and in that case it is actually worse because it searches the strings twice.

There's a much better solution: Use the AhoCarasick crate. There's even an example in the readme:

use aho_corasick::AhoCorasick;

let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";
let replace_with = &["sloth", "grey", "slow"];

let ac = AhoCorasick::new(patterns);
let result = ac.replace_all(haystack, replace_with);
assert_eq!(result, "The slow grey sloth.");

for sanitization purposes

I should also say that blacklisting "bad" strings is completely the wrong way to do sanitisation.

Timmmm
  • 88,195
  • 71
  • 364
  • 509
5

how would you write multiple chained substring replacements?

I would do it just as asked:

fn main() {
    let a = "hello";
    let b = a.replace("e", "a").replace("ll", "r").replace("o", "d");
    println!("{}", b);
}

It you are asking how to do multiple concurrent replacements, passing through the string just once, then it does indeed get much harder.

This does require allocating new memory for each replace call, even if no replacement was needed. An alternate implementation of replace might return a Cow<str> which only includes the owned variant when the replacement would occur. A hacky implementation of that could look like:

use std::borrow::Cow;

trait MaybeReplaceExt<'a> {
    fn maybe_replace(self, needle: &str, replacement: &str) -> Cow<'a, str>;
}

impl<'a> MaybeReplaceExt<'a> for &'a str {
    fn maybe_replace(self, needle: &str, replacement: &str) -> Cow<'a, str> {
        // Assumes that searching twice is better than unconditionally allocating
        if self.contains(needle) {
            self.replace(needle, replacement).into()
        } else {
            self.into()
        }
    }
}

impl<'a> MaybeReplaceExt<'a> for Cow<'a, str> {
    fn maybe_replace(self, needle: &str, replacement: &str) -> Cow<'a, str> {
        // Assumes that searching twice is better than unconditionally allocating
        if self.contains(needle) {
            self.replace(needle, replacement).into()
        } else {
            self
        }
    }
}

fn main() {
    let a = "hello";
    let b = a.maybe_replace("e", "a")
        .maybe_replace("ll", "r")
        .maybe_replace("o", "d");
    println!("{}", b);

    let a = "hello";
    let b = a.maybe_replace("nope", "not here")
        .maybe_replace("still no", "i swear")
        .maybe_replace("but no", "allocation");
    println!("{}", b);
    assert_eq!(b.as_ptr(), a.as_ptr());
}
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • How do you go from `str::replace(string, "or", "str");` to being able to chain method calls on string objects? That is, why does that work? – mkaito Dec 15 '14 at 20:21
  • I'm not sure I understand... You can just call it: `"hello".replace("h", "y")`. – Shepmaster Dec 15 '14 at 20:23
  • Is this simmilar to how D can turn `method(str, arg)` into `str.method(arg)`? In case you're not familiar with D, the compiler turns the latter into the former at compile time transparently. – mkaito Dec 15 '14 at 20:24
  • 1
    I think D refers to that as Universal Function Call Syntax (UFCS), and Rust *has* something called UFCS, but as far as I know, they are different. In this case, there are **two** definitions of `replace`: a [free function](http://doc.rust-lang.org/std/str/fn.replace.html) and a [trait method](http://doc.rust-lang.org/std/str/trait.StrAllocating.html#tymethod.replace). – Shepmaster Dec 15 '14 at 20:26
3

There is no way in the standard library to do this; it’s a tricky thing to get right with a large number of variations on how you would go about doing it, depending on a number of factors. You would need to write such a function yourself.

Chris Morgan
  • 86,207
  • 24
  • 208
  • 215
  • I know that in Ruby, I can chain gsub together, and will do the right thing. I figure Rust could do the same, if the replace function were a method of the string object. But it isn't. If you were to write idiomatic Rust, whatever that means, would you just make a new variable for each step? – mkaito Dec 15 '14 at 04:07
  • 2
    It would depend on what I was doing and the nature of the replacements. Is it a character-wise translation? Are all the from/to pairs equivalently sized or not? How many are there? I would always be inclined to write it as multiple chained replacements initially until I *knew* performance was going to be an issue. – Chris Morgan Dec 15 '14 at 04:16
-4

Stumbled upon this in codewars. Credit goes to user gom68

fn replace_multiple(rstring: &str) -> String {
  rstring.chars().map(|c|
      match c {
          'A' => 'Z',
          'B' => 'Y',
          'C' => 'X',
          'D' => 'W',
          s => s
      }
  ).collect::<String>()
}