can regex alternation be used in replace?

Question

I doubt it's possible, but I haven't found anything to specifically say it's not possible. But is there some way to construct a parallel alternation in a search and replace regex? So, for example, if I wanted to replace street types with their abbreviations, could I do something like this:

s/(STREET|AVENUE|BOULEVARD)/(ST|AVE|BLVD)/

without having the entire rhs substituted in? Or do I really have to do separate replaces for each street type?

Language? You can do this in Perl and Python by calling a function. — dawg, Sep 27 '16 at 19:46
What language are you using? Many languages allow you to use a function when replacing, and then it can provide different replacements depending on the matched string. E.g. PHP `preg_replace_callback()`. — Barmar, Sep 27 '16 at 19:47
If you're doing this in a text editor, it's probably not possible. — Barmar, Sep 27 '16 at 19:48
[It is possible in Notepad++.](http://stackoverflow.com/questions/37160927/how-to-use-conditionals-when-replacing-in-notepad-via-regex/37161309#37161309) — Wiktor Stribiżew, Sep 27 '16 at 19:50
Could be done in dreamweaver too `(?:(ST)REET|(AVE)NUE|(B)OU(L)E(V)AR(D))`, `$1$2$3$4$5$6`. Knowing where would help this question a lot.. — chris85, Sep 27 '16 at 19:57

SamWhan · Accepted Answer · 2016-09-28T14:01:40.570

3

This isn't that pretty, but it'll get the job done:

Replace

(?:(ST)REET|(AVE)NUE|(B)OU(L)E(V)AR(D))

with

\1\2\3\4\5\6

It matches the words, capturing the relevant parts. Replace with all capture groups and the relevant parts are inserted.

See it here at regex101.

edited Sep 28 '16 at 14:01

answered Sep 27 '16 at 20:00

SamWhan

8,296
1
18
45

So, can we up the ante on parallel replacements that aren't strictly abbreviations? - so that /(FIRST|SECOND|THIRD)/ could be replaced by 1ST|2ND|3RD – Eddie Rowe Sep 28 '16 at 15:13
2

Not without programming logic (to my knowledge), (Or like mentioned, Notepad++, and likes...) – SamWhan Sep 28 '16 at 15:30

Casimir et Hippolyte · Answer 2 · 2016-09-28T10:08:32.017

3

For the fun, and for these three words only in PCRE/Perl/Python regex module/npp:

(?:\G(?!^)|\b(?=(?:STREET|AVENUE|BOULEVARD)\b))[A-Z]*?\K(?:TREE|E(?:NU)?|OU|AR)\B

replace with the empty string.

demo

or this one:

\G[A-Z]*?(?>\W*\b(?>\w+\W+)*?(?=(?:STREET|AVENUE|BOULEVARD)\b))?[A-Z]*?\K(?:TREE\B|E(?:NU)?\B|OU\B|AR\B)

demo

edited Sep 28 '16 at 10:08

answered Sep 27 '16 at 20:08

Casimir et Hippolyte

88,009
5
94
125

dawg · Answer 3 · 2016-09-27T20:28:33.023

In Python, you would use a call back to a dictionary like so:

>>> abs={'STREET':'ST', 'AVENUE':'AVE','BOULEVARD':'BLVD'}
>>> re.sub(r'(STREET|AVENUE|BOULEVARD)', lambda m: abs[m.group(1)], 'Fourth STREET')
'Fourth ST'

In Perl, you can do:

use strict;
use warnings;

my %abs=(
    'STREET', 'ST',
    'AVENUE' ,'AVE',
    'BOULEVARD', 'BLVD'
);
$_='Fourth STREET';
s/(STREET)|(AVENUE)|(BOULEVARD)/$abs{$1}/ && print;

score 1 · Answer 4 · answered Sep 27 '16 at 19:50

1

It depends on language or tool you are using. For example, using Notepad++, you can replace

(STREET)|(AVENUE)|(BOULEVARD)

with:

(?1ST)(?2AVE)(?3BLVD)

answered Sep 27 '16 at 19:50

logi-kal

7,107
6
31
43

Chad Davis · Answer 5 · 2016-09-28T15:21:06.240

Well, the first two substrings aren't too difficult:

import re

s = 'street'; a = 'avenue'; b = 'boulevard'

re.sub(r'(str)eet|(ave)nue|(boulevard)', r'\1 \2 \3', s)
re.sub(r'(str)eet|(ave)nue|(boulevard)', r'\1 \2 \3', a)
re.sub(r'(str)eet|(ave)nue|(boulevard)', r'\1 \2 \3', b)

The last three lines return matches plus white space for the groups that weren't matched. I think one may have to do further processing on the string in order to get 'blvd' from 'boulevard' were it to be captured by the above regex. That's reasonable though, since extracting a set of substrings from 'boulevard' is a separate issue from capturing and replacing one of a set of alternate regexes.

Perhaps, since this way already requires the extra step of removing whitespace, one could do something like this:

#with boulevard
new_str = re.sub(r'(str)eet|(ave)nue|(b)oulevard', r'\1 \2 \3lvd', b)
re.sub(r'\s+|\blvd', '', new_str)

#with avenue
new_str = re.sub(r'(str)eet|(ave)nue|(b)oulevard', r'\1 \2 \3lvd', a)
re.sub(r'\s+|\blvd', '', new_str)

The code looks kinda funny though.

Hmm... How does [this example at regex101 strike you](https://regex101.com/r/38q300/2)? — SamWhan, Sep 28 '16 at 14:10
@ClasG, as I said, funny (not good). That's why I added the line of code which removes any whitespaces or sequences 'lvd' with a word boundary immediately on the left. — Chad Davis, Sep 28 '16 at 15:29
@ClasG, ah, I see. My test cases (the variables s, a, and b) didn't cover full sentences, and I see that that's pretty unrealistic. — Chad Davis, Sep 28 '16 at 15:38

can regex alternation be used in replace?

5 Answers5