18

Let's say I have a string like this:

=====

and I want to replace it with this:

-----

I only want to replace it if it has more than a certain number of that character (we'll say > 3).

So, these should be the replacements:

=== -> ===
==== -> ----
===== -> -----

The application is I want to replace all level 1 heading marks in markdown with a level 2 mark, without changing embedded code blocks.

I know I can do this:

/=/-/g, but this matches anything with an equals sign (if (x == y)), which is undesirable.

or this:

/===+/----/g, but this doesn't account for the length of the original matched string.

Is this possible?

beatgammit
  • 19,817
  • 19
  • 86
  • 129
  • 1
    I intentionally left that out. I am using vim or sed for doing regex, but I'd like a general solution that can be applied to any language. – beatgammit Sep 07 '11 at 18:13
  • I don't think that there's a generic way to do what you want, in pure RE. As mentioned below, various implementations have added features to allow it, but nothing that would work across everything that supports RE. – zigdon Sep 07 '11 at 18:23
  • So I intentially didnt answer :-). The tag descriptions says Therefore, when asking questions, always include the specific programming language or tool. – justintime Sep 07 '11 at 20:11
  • 1
    Same question for javascript: http://stackoverflow.com/questions/7456559/javascript-regex-replace-sequence-of-characters-with-same-number-of-another-cha – blahdiblah Apr 27 '17 at 20:02
  • Same question for PHP: http://stackoverflow.com/questions/6149555/how-do-i-replace-multiple-characters-with-the-same-number-of-characters-with-a-r?noredirect=1&lq=1 – blahdiblah Apr 27 '17 at 20:02

4 Answers4

10

It's possible with Perl:

my $string = "===== Hello World ====";
$string =~ s/(====+)/"-" x length($1)/eg;
# $string contains ----- Hello World ----

Flag /e makes Perl execute expression in second part of s///. You may try this with oneliner:

perl -e '$ARGV[0] =~ s/(====+)/"-" x length($1)/eg; print $ARGV[0]' "===== Hello World ===="
yko
  • 2,710
  • 13
  • 15
  • I like the command-line version of this. I'd still like something more general, but this solves the problem. – beatgammit Sep 07 '11 at 18:27
  • @tjameson, I'm not sure what do you mean "more general", but you also may like shortener version of regexp: `s/(={3,})/"-" x length($1)/eg` where `={4,}` means 4 or more characters – yko Sep 07 '11 at 18:48
  • I just meant not perl. Something that will work on most regex platforms (sed, vim, etc). Anyway, the problem is solved, so thanks. – beatgammit Sep 08 '11 at 00:56
4

Depending what language you're using. Basically, in some languages, you can put code in the right side of the regexp, allowing you to do something like this: (this is in perl):

s/(=+)/(length($1) > 3 ? "-" : "=") x length($1)/e

The 'e' flag tells perl to execute the code in the right side of the expression instead of just parsing it as a string.

zigdon
  • 14,573
  • 6
  • 35
  • 54
  • I'm not using a language. I'm just using sed or vim's regex. Is it possible without using a language? – beatgammit Sep 07 '11 at 17:39
  • 1
    Pretty sure vim's language is good enough to allow you to do something similar - see the `\=` operator in the replace command. – zigdon Sep 07 '11 at 18:05
  • 1
    @tjameson: You *are* using “a language”, no matter what you are using. Also, I can see no possible reason why anyone would ever use `sed -e 's/foo/bar/g'` when `perl -pe 's/foo/bar/g'` works so much better. **“Try it, you’ll *like* it!”** – tchrist Sep 07 '11 at 18:14
  • @tchrist- Technically, regex recognizes a language. Certain features are common across implementations of regular expressions. It is that subset that I am trying to access. It is not tied to an implementation, but to generally accepted "standard" features. I want the same thing to work in sed, vim, perl, boost for c++, javascript, python, etc without changing very much except maybe a little syntax. That being said, I understand that this answer has its merits, but it is specific to Perl, which is not what I asked for. If I can't find a more general solution, I'll accept this. – beatgammit Sep 07 '11 at 18:18
  • If you try to stick the the common subset of ALL regex implementatiosn you would be rather restricted. Grouping is ( ) or \( \) etc etc – justintime Sep 07 '11 at 20:15
2

I was also looking for a pure regex solution for something like this. I didn't find one on SO, so I worked it out.

Short version: here is the regex:

((?<====)=)|(=(?====))|((?<===)=(?==))|((?<==)=(?===))

Here is how I got there, using R:

str <- " = == === ==== ===== ====== ======="

gsub("=(?====)",      "-", str, perl = TRUE) # (1) Pos. lookahead
gsub("(?<====)=",     "-", str, perl = TRUE) # (2) Pos. look-behing
gsub("(?<===)=(?==)", "-", str, perl = TRUE) # (3) Middle part for cases of 4 or 5 ='s (1/2)
gsub("(?<==)=(?===)", "-", str, perl = TRUE) # (4) Middle part for cases of 4 or 5 ='s (2/2)

# Combining all, we have:
gsub("((?<====)=)|(=(?====))|((?<===)=(?==))|((?<==)=(?===))", "-", str, perl = TRUE) # (5)

(1) = == === -=== --=== ---=== ----===
(2) = == === ===- ===-- ===--- ===----
(3) = == === ==-= ==--= ==---= ==----=
(4) = == === =-== =--== =---== =----==
(5) = == === ---- ----- ------ -------

Alternative method for a less convoluted regex (but requires 3 steps)

# First, deal with 4 & 5 equal signs with negative look-behind and lookahead
str <- gsub("(?<!=)={4}(?!=)", "----",     str, perl = TRUE) # (2.1)
str <- gsub("(?<!=)={5}(?!=)", "-----",    str, perl = TRUE) # (2.2)

# Then use regex (3) from above for 6+ equal signs
str <- gsub("((?<====)=)|(=(?====))", "-", str, perl = TRUE) # (2.3)

(2.1) = == === ---- ===== ====== =======
(2.2) = == === ---- ----- ====== =======
(2.3) = == === ---- ----- ------ -------
Dominic Comtois
  • 10,230
  • 1
  • 39
  • 61
1

Specifically for this Markdown headings use case, you can use the fact that these always come at the start of a line:

/(^=|(?<==)=)/-/

will replace all the '=' characters that are either at the start of a line, or have a '=' before them.

It'll zap double '==' in the text though... perhaps someone can improve on that?

joachim
  • 28,554
  • 13
  • 41
  • 44