2

I'm looking to extract:

50%

From a string that will have more or less this format:

The 50% is in here somewhere.

I'd also like to extract:

50%50%25%

From a string like this:

50% of 50% is 25%

Regex.Match() seems the obvious contender. However, this involves checking if any matches were found (e.g. match.Success), extracting the results from a particular index in the array, and/or the risk of addressing an out-of-bounds index.

Regex replace is generally simpler to apply. A single line does the job, including returning the resulting string. This is true for so many languages.

result = Regex.Replace(input, stuffWeDontLike, "")

Basically, I am looking for a regex filter - instead of entering the pattern to replace, I want to enter the pattern to retrieve.

percentages = Regex.Filter("50% of 50% is 25%", "[0-9]+\%")

Could we form a regex and invert the result, as if it were a selection? That would allow the use of regex replace. However, I could not find a way to easily invert a regex.

How can we achieve the desired result (or similar; a join or so seems acceptable) with very short and simple syntax, similar to regex replace?

Timo
  • 7,992
  • 4
  • 49
  • 67

3 Answers3

2

You can use Regex.Matches and concatenate each matches result. Just pick one you like the most.

//Sadly, we can't extend the Regex class
public class RegExp
{
    //usage : RegExp.Filter("50% of 50% is 25%", @"[0-9]+\%")
    public static string Filter(string input, string pattern)
    {
        return Regex.Matches(input, pattern).Cast<Match>()
            .Aggregate(string.Empty, (a,m) => a += m.Value);
    }
}

public static class StringExtension
{
    //usage : "50% of 50% is 25%".Filter(@"[0-9]+\%")
    public static string Filter(this string input, string pattern)
    {
        return Regex.Matches(input, pattern).Cast<Match>()
            .Aggregate(string.Empty, (a,m) => a += m.Value);
    }
}
Xiaoy312
  • 14,292
  • 1
  • 32
  • 44
  • One-liners. Just what I was hoping for! In the context of multiple developers, a snippet was more suitable anyway. I now insert `Regex.Matches(input, pattern).Cast().Aggregate("", (s, m) => s + m.Value);` This solution communicates intent better than my replace abuse, while still being very concise. – Timo Jan 19 '15 at 08:52
2

I do not understand your reasoning why you want to use replace. Why go that way in the first place? There are methods in the Regex class that allow you to precisely get all the desired matches. Your roundabout way at getting to your solution I find is pointless.

Just use Matches() to collect the matches. You could then join them into the string that you wanted.

var str = "50% of 50% is 25%";
var re = new Regex(@"\d+%");
var ms = re.Matches(str);
var values = ms.Cast<Match>().Select(m => m.Value);
var joined = String.Join("", values); // "50%50%25%"
Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272
  • I collapsed this into `String.Join("", Regex.Matches(str, @"[0-9]+\%").Cast().Select(m => m.Value));` Preferred the solution using `Aggregate()` in the end, because it maintains the logical reading order even on a single line. – Timo Jan 19 '15 at 08:59
  • Btw, `[0-9]` cannot be freely replaced by `\d`, because the latter also matches things like Arabic numerals. I used `[0-9]` on purpose. – Timo Jan 19 '15 at 09:01
1

One solution is to use regex replace as follows:

Regex.Replace("50% of 50% is 25%", "(\d+\%)|(?:.+?)", "$1");

Output:

50%50%25%

As a general approach:

Regex.Replace(input, (pattern)|(?:.+?), "$1");

This finds anything that matches either of the following:

  • The pattern. Captured as $1. This is what we want to keep.
  • Any character, any number of times, but non-greedy. This finds anything that is not captured by the first group. ?: because we don't need to capture this group.

As MSDN states: "$1 replaces the entire match with the first captured subexpression." (That is, all matches for that substring, concatenated.)

Effectively, this is the described regex filter.

Timo
  • 7,992
  • 4
  • 49
  • 67
  • I deleted my other comment. But.... what would be wrong with `(pattern)|.` instead of `(pattern)|(?:.+?)` and then still using `$1` in the replace section? – Dan W Jul 01 '15 at 17:48
  • I believe that should work. I had the hunch that a single `.+?` would be faster than repeating `.` to clear out chunks of uninteresting characters at a time, but I might very well be mistaken. – Timo Jul 02 '15 at 14:46