6

I want to catch numbers appearing anywhere in a string, and replace them with "(.+)".

But I want to catch only those numbers which have an even number of %s preceding them. No worries if any surrounding chars get caught up: we can use capture groups to filter out the numbers.

I'm unable to come up with an ECMAscript regular expression.

Here is the playground:

abcd %1 %%2 %%%3 %%%%4 efgh

abcd%12%%34%%%666%%%%11efgh

A successful catch will behave like this:
desired behaviour


Things I have tried:

attempt 1

attempt 2

attempt 3


If you have realised, the third attempt is almost working. The only problems are in the second line of playground. Actually, what I wanted to say in that expression is:

Match a number if it is preceded by an even number of %s AND either of the following is true:

  • The above whole expression is preceded by nothing [absence of (unconsumed or otherwise) character].
  • The above whole expression is preceded by a character other than %.

Is there a way to match the absence of a character?
That's what I was trying to do by using \0 in the third attempt.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
AneesAhmed777
  • 2,475
  • 2
  • 13
  • 18

3 Answers3

2

You can use (?:[^%\d]|^|\b(?=%))(?:%%)*(\d+) as a pattern, where your number is stored into the first capturing group. This also treats numbers preceded by zero %-characters.

This will match the even number of %-signs, if they are preceded by:

  • neither % nor number (so we don't need to catch the last number before a %, as this wouldn't work with chains like %%1%%2)
  • the start of the string
  • a word boundary (thus any word character), for the chains mentioned above

You can see it in action here

Sebastian Proske
  • 8,255
  • 2
  • 28
  • 37
  • +1 Yes `0` is to be considered a valid even number. Using ideas from your expression, I came up with this: `(^|[^%\d]|[^%](?=%))(%%)*(\d+)`. Is it good? Can you improve it? – AneesAhmed777 Jul 10 '16 at 16:26
  • I have edited my answer accordingly. I would prefer the use of `\b` over `[^%]` for the mentioned chains (you won't completely catch them with your approach) – Sebastian Proske Jul 10 '16 at 17:26
  • @Wiktor Can you show a case where the updated expression is not working? – AneesAhmed777 Jul 11 '16 at 07:55
  • @AneesAhmed777: Perhaps, it will work OK, I take my words back. However, it is definitely much less readable for the majority of coders (not for me, I'd come up with this myself, but Seba was the first :) ). – Wiktor Stribiżew Jul 11 '16 at 08:02
2

Issue

You want a regex with a negative infinite-width lookbehind:

(?<=(^|[^%])(?:%%)*)\d+

Here is the .NET regex demo

In ES7, it is not supported, you need to use language-specific means and a simplified regex to match any number of % before a digit sequence: /(%*)(\d+)/g and then check inside the replace callback if the number of percentage signs is even or not and proceed accordingly.

JavaScript

Instead of trying to emulate a variable-width lookbehind, you may just use JS means:

var re = /(%*)(\d+)/g;          // Capture into Group 1 zero or more percentage signs
var str = 'abcd %1 %%2 %%%3 %%%%4 efgh<br/><br/>abcd%12%%34%%%666%%%%11efgh';
var res = str.replace(re, function(m, g1, g2) { // Use a callback inside replace
  return (g1.length % 2 === 0) ? g1 + '(.+)' : m; // If the length of the %s is even
});                             // Return Group 1 + (.+), else return the whole match
document.body.innerHTML = res;

If there must be at least 2 % before digits, use /(%+)(\d+)/g regex pattern where %+ matches at least 1 (or more) percentage signs.

Conversion to C++

The same algorithm can be used in C++. The only problem is that there is no built-in support for a callback method inside the std::regex_replace. It can be added manually, and used like this:

#include <iostream>
#include <cstdlib>
#include <string>
#include <regex>
using namespace std;

template<class BidirIt, class Traits, class CharT, class UnaryFunction>
std::basic_string<CharT> regex_replace(BidirIt first, BidirIt last,
    const std::basic_regex<CharT,Traits>& re, UnaryFunction f)
{
    std::basic_string<CharT> s;

    typename std::match_results<BidirIt>::difference_type
        positionOfLastMatch = 0;
    auto endOfLastMatch = first;

    auto callback = [&](const std::match_results<BidirIt>& match)
    {
        auto positionOfThisMatch = match.position(0);
        auto diff = positionOfThisMatch - positionOfLastMatch;

        auto startOfThisMatch = endOfLastMatch;
        std::advance(startOfThisMatch, diff);

        s.append(endOfLastMatch, startOfThisMatch);
        s.append(f(match));

        auto lengthOfMatch = match.length(0);

        positionOfLastMatch = positionOfThisMatch + lengthOfMatch;

        endOfLastMatch = startOfThisMatch;
        std::advance(endOfLastMatch, lengthOfMatch);
    };

    std::sregex_iterator begin(first, last, re), end;
    std::for_each(begin, end, callback);

    s.append(endOfLastMatch, last);

    return s;
}

template<class Traits, class CharT, class UnaryFunction>
std::string regex_replace(const std::string& s,
    const std::basic_regex<CharT,Traits>& re, UnaryFunction f)
{
    return regex_replace(s.cbegin(), s.cend(), re, f);
}

std::string my_callback(const std::smatch& m) {
  if (m.str(1).length() % 2 == 0) {
    return m.str(1) + "(.+)";
  } else {
    return m.str(0);
  }
}

int main() {
    std::string s = "abcd %1 %%2 %%%3 %%%%4 efgh\n\nabcd%12%%34%%%666%%%%11efgh";
    cout << regex_replace(s, regex("(%*)(\\d+)"), my_callback) << endl;

    return 0;
}

See the IDEONE demo.

Special thanks for the callback code goes to John Martin.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Not helpful, when I say ECMAscript, I mean it. (I'm using C++ Std Library). Not downvoting because it maybe useful someone in future. – AneesAhmed777 Jul 10 '16 at 16:32
  • Look at the tags - there is no C++ tag. There is JavaScript tag though. – Wiktor Stribiżew Jul 10 '16 at 16:33
  • In my defence: It was added by someone else. – AneesAhmed777 Jul 10 '16 at 16:35
  • I added a C++ implementation. – Wiktor Stribiżew Jul 10 '16 at 16:56
  • +1 Mind Blowing !!! I express my gratitude but this thread is more about regex than c++/js. Nice work though. – AneesAhmed777 Jul 11 '16 at 08:11
  • Well, a regex often goes hand in hand with code around it. If you can afford a couple of code lines more that will help simplify the regex, it is usually of great help to all those developers who are going to maintain the code after you leave the company. Not so many people are versed in word boundaries with lookaheads. A lot even make mistakes when writing character classes. – Wiktor Stribiżew Jul 11 '16 at 08:16
  • In the past few hours, I realised the practicality of your method. So, I'm using it in my code. It is easily scalable to any problem, unlike the complex regex solution I asked for. But can't accept as answer, because @Sebastian's answer is more valid for my *limited* problem. But for a general and *unlimited* problem, your solution is awesome. :-) – AneesAhmed777 Jul 11 '16 at 15:25
  • As you wish. Accept only the answers that worked best for you and this is subjective - as anything here on SO. Happy coding! – Wiktor Stribiżew Jul 11 '16 at 15:27
0

I don't know ECMAScript but following documentation has the answer:

ECMAScript regex

Search for negative lookahead, which will result in something like this:

(?!%)(([%]{2})*\d+)

...where (?!%) means not preceded by % literal.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Ruben Pirotte
  • 386
  • 2
  • 11
  • `(?!%)` means not **followed** by `%`. You're thinking of negative *lookbehind*, which is not supported in ECMAScript. – Alan Moore Jul 10 '16 at 18:42