Regex- replace sequence of one character with same number of another character

Question

Let's say I have a string like this:

=====

and I want to replace it with this:

-----

I only want to replace it if it has more than a certain number of that character (we'll say > 3).

So, these should be the replacements:

=== -> ===
==== -> ----
===== -> -----

The application is I want to replace all level 1 heading marks in markdown with a level 2 mark, without changing embedded code blocks.

I know I can do this:

/=/-/g, but this matches anything with an equals sign (if (x == y)), which is undesirable.

or this:

/===+/----/g, but this doesn't account for the length of the original matched string.

Is this possible?

I intentionally left that out. I am using vim or sed for doing regex, but I'd like a general solution that can be applied to any language. — beatgammit, Sep 07 '11 at 18:13
I don't think that there's a generic way to do what you want, in pure RE. As mentioned below, various implementations have added features to allow it, but nothing that would work across everything that supports RE. — zigdon, Sep 07 '11 at 18:23
So I intentially didnt answer :-). The tag descriptions says Therefore, when asking questions, always include the speciﬁc programming language or tool. — justintime, Sep 07 '11 at 20:11
Same question for javascript: http://stackoverflow.com/questions/7456559/javascript-regex-replace-sequence-of-characters-with-same-number-of-another-cha — blahdiblah, Apr 27 '17 at 20:02
Same question for PHP: http://stackoverflow.com/questions/6149555/how-do-i-replace-multiple-characters-with-the-same-number-of-characters-with-a-r?noredirect=1&lq=1 — blahdiblah, Apr 27 '17 at 20:02

score 10 · Accepted Answer · answered Sep 07 '11 at 17:37

10

It's possible with Perl:

my $string = "===== Hello World ====";
$string =~ s/(====+)/"-" x length($1)/eg;
# $string contains ----- Hello World ----

Flag /e makes Perl execute expression in second part of s///. You may try this with oneliner:

perl -e '$ARGV[0] =~ s/(====+)/"-" x length($1)/eg; print $ARGV[0]' "===== Hello World ===="

answered Sep 07 '11 at 17:37

yko

2,710
13
15

I like the command-line version of this. I'd still like something more general, but this solves the problem. – beatgammit Sep 07 '11 at 18:27
@tjameson, I'm not sure what do you mean "more general", but you also may like shortener version of regexp: `s/(={3,})/"-" x length($1)/eg` where `={4,}` means 4 or more characters – yko Sep 07 '11 at 18:48
I just meant not perl. Something that will work on most regex platforms (sed, vim, etc). Anyway, the problem is solved, so thanks. – beatgammit Sep 08 '11 at 00:56

score 4 · Answer 2 · answered Sep 07 '11 at 17:36

4

Depending what language you're using. Basically, in some languages, you can put code in the right side of the regexp, allowing you to do something like this: (this is in perl):

s/(=+)/(length($1) > 3 ? "-" : "=") x length($1)/e

The 'e' flag tells perl to execute the code in the right side of the expression instead of just parsing it as a string.

answered Sep 07 '11 at 17:36

zigdon

14,573
6
35
54

I'm not using a language. I'm just using sed or vim's regex. Is it possible without using a language? – beatgammit Sep 07 '11 at 17:39
1

Pretty sure vim's language is good enough to allow you to do something similar - see the `\=` operator in the replace command. – zigdon Sep 07 '11 at 18:05
1

@tjameson: You *are* using “a language”, no matter what you are using. Also, I can see no possible reason why anyone would ever use `sed -e 's/foo/bar/g'` when `perl -pe 's/foo/bar/g'` works so much better. **“Try it, you’ll *like* it!”** – tchrist Sep 07 '11 at 18:14
@tchrist- Technically, regex recognizes a language. Certain features are common across implementations of regular expressions. It is that subset that I am trying to access. It is not tied to an implementation, but to generally accepted "standard" features. I want the same thing to work in sed, vim, perl, boost for c++, javascript, python, etc without changing very much except maybe a little syntax. That being said, I understand that this answer has its merits, but it is specific to Perl, which is not what I asked for. If I can't find a more general solution, I'll accept this. – beatgammit Sep 07 '11 at 18:18
If you try to stick the the common subset of ALL regex implementatiosn you would be rather restricted. Grouping is ( ) or  etc etc – justintime Sep 07 '11 at 20:15

Dominic Comtois · Answer 3 · 2020-04-19T09:39:23.983

I was also looking for a pure regex solution for something like this. I didn't find one on SO, so I worked it out.

Short version: here is the regex:

((?<====)=)|(=(?====))|((?<===)=(?==))|((?<==)=(?===))

Here is how I got there, using R:

str <- " = == === ==== ===== ====== ======="

gsub("=(?====)",      "-", str, perl = TRUE) # (1) Pos. lookahead
gsub("(?<====)=",     "-", str, perl = TRUE) # (2) Pos. look-behing
gsub("(?<===)=(?==)", "-", str, perl = TRUE) # (3) Middle part for cases of 4 or 5 ='s (1/2)
gsub("(?<==)=(?===)", "-", str, perl = TRUE) # (4) Middle part for cases of 4 or 5 ='s (2/2)

# Combining all, we have:
gsub("((?<====)=)|(=(?====))|((?<===)=(?==))|((?<==)=(?===))", "-", str, perl = TRUE) # (5)

(1) = == === -=== --=== ---=== ----===
(2) = == === ===- ===-- ===--- ===----
(3) = == === ==-= ==--= ==---= ==----=
(4) = == === =-== =--== =---== =----==
(5) = == === ---- ----- ------ -------

Alternative method for a less convoluted regex (but requires 3 steps)

# First, deal with 4 & 5 equal signs with negative look-behind and lookahead
str <- gsub("(?<!=)={4}(?!=)", "----",     str, perl = TRUE) # (2.1)
str <- gsub("(?<!=)={5}(?!=)", "-----",    str, perl = TRUE) # (2.2)

# Then use regex (3) from above for 6+ equal signs
str <- gsub("((?<====)=)|(=(?====))", "-", str, perl = TRUE) # (2.3)

(2.1) = == === ---- ===== ====== =======
(2.2) = == === ---- ----- ====== =======
(2.3) = == === ---- ----- ------ -------

score 1 · Answer 4 · answered Jun 25 '20 at 10:19

Specifically for this Markdown headings use case, you can use the fact that these always come at the start of a line:

/(^=|(?<==)=)/-/

will replace all the '=' characters that are either at the start of a line, or have a '=' before them.

It'll zap double '==' in the text though... perhaps someone can improve on that?

Regex- replace sequence of one character with same number of another character

4 Answers4

Linked