Replace multiple patterns, but not with the same string

Question

is it possible to change multiply patterns to different values at the same command? lets say I have

A B C D ABC

and I want to change every A to 1 every B to 2 and every C to 3

so the output will be

1 2 3 D 123

since I have 3 patterns to change I would like to avoid substitute them separately. I thought there would be something like

sed -r s/'(A|B|C)'/(1|2|3)/

but of course this just replace A or B or C to (1|2|3). I should just mention that my real patterns are more complicated than that...

thank you!

what is the (unwrited) constraint to avoid several `s///` especially on complex pattern like @anubhava ask ? — NeronLeVelu, Apr 13 '15 at 13:42
question is not exactly the same as the link to the duplicate. The linked is a sub case of this, only some specific simple pattern to replace by uniq new pattern where this question is more genering in search and replace pattern — NeronLeVelu, Apr 13 '15 at 14:01
If you need "words" you should use post an example that uses "words", not just letters as letters are MUCH simpler to do (`tr`) and the right way to handle "words" really depends on what a "word" means to you and/or what the separators can be between the "words". As written right now your question is extremely likely to produce a solution that works for your posted input but will fail (possibly quietly and/or cryptically and/or disastrously) later when run against some different input. — Ed Morton, Apr 13 '15 at 15:42

hek2mgl · Answer 1 · 2019-10-15T07:34:52.463

23

Easy in sed:

sed 's/WORD1/NEW_WORD1/g;s/WORD2/NEW_WORD2/g;s/WORD3/NEW_WORD3/g'

You can separate multiple commands on the same line by a ;

Update

Probably this was too easy. NeronLeVelu pointed out that the above command can lead to unwanted results because the second substitution might even touch results of the first substitution (and so on).

If you care about this you can avoid this side effect with the t command. The t command branches to the end of the script, but only if a substitution did happen:

sed 's/WORD1/NEW_WORD1/g;t;s/WORD2/NEW_WORD2/g;t;s/WORD3/NEW_WORD3/g'

edited Oct 15 '19 at 07:34

answered Apr 13 '15 at 13:40

hek2mgl

152,036
28
249
266

assuming that there is no pattern matching in following search pattern (ex: A -> BABY than B -> UNWANTED) – NeronLeVelu Apr 13 '15 at 13:57
I don't get you. Can you elaborate? – hek2mgl Apr 13 '15 at 13:58
1

I think @NeronLeVelu means that if an earlier substitution _results_ in something that a later substitution's _regex_ matches, you'll get undesired double substitution. – mklement0 Apr 13 '15 at 14:00
Ok got it, yes this can happen. We could circumvent this using the `t` command. Let me add that. – hek2mgl Apr 13 '15 at 14:01
@hek2mgl exactly. Due to sequentiel and not parrallel change (what a *OR* do). Now, i'm sur our solution is ok for 99,9% of the case so it's not a real issue – NeronLeVelu Apr 13 '15 at 14:04
Added that to the answer – hek2mgl Apr 13 '15 at 14:10
Unlikely to be useful without word boundaries (e.g. try to replace `the` with `a` in the string `there is the problem`). – Ed Morton Apr 13 '15 at 15:45
It depends. I can also imagine a lot of use cases where word boundaries aren't useful. Let's wait what OP says. – hek2mgl Apr 13 '15 at 15:51
Using a conditional branch prevents two different patterns being replaced on the same line. Fwiw: here's a solution written (on a different SE site) before this question was asked here: https://unix.stackexchange.com/a/137932/24557 – rici Oct 15 '19 at 23:53

choroba · Accepted Answer · 2015-04-13T15:58:55.323

3

Easy in Perl:

perl -pe '%h = (A => 1, B => 2, C => 3); s/(A|B|C)/$h{$1}/g'

If you use more complex patterns, put the more specific ones before the more general ones in the alternative list. Sorting by length might be enough:

perl -pe 'BEGIN { %h = (A => 1, AA => 2, AAA => 3);
              $re = join "|", sort { length $b <=> length $a } keys %h; }
          s/($re)/$h{$1}/g'

To add word or line boundaries, just change the pattern to

/\b($re)\b/
# or
/^($re)$/
# resp.

edited Apr 13 '15 at 15:58

answered Apr 13 '15 at 13:38

choroba

231,213
25
204
289

Unlikely to be useful without word boundaries (e.g. try to replace `the` with `a` in the string `there is the problem`). It's also not great that you need to list the words to look for twice - once when creating the mapping and then again in the `s//`. – Ed Morton Apr 13 '15 at 15:47
@EdMorton: You can usually omit the second list by using `join '|', sort { length $b <=> length $a } keys %h`. You can also `map "\\b$_\\b"` or `\b($re)\b` to add word boundaries. – choroba Apr 13 '15 at 15:49
Would you mind editing the answer to show that as an alternative complete solution? – Ed Morton Apr 13 '15 at 15:56
Thanks. What would the `/^($re)$/` be used for? `\b` seems to work even when the RE is at the start of a line. Why not just use `\b` in the RE instead of mentioning it as an option - is there some case where it would not produce the desired behavior or is it non-portable or something else? – Ed Morton Apr 13 '15 at 16:13
1

I like the idea of sorting by length,. btw - never thought of that before, very interesting approach that could noticably simplify coding the solution to this problem! – Ed Morton Apr 13 '15 at 16:20
1

@EdMorton I think it is funny that I heard that *sorting by length* suggest 2 times this day. (and never before). Like it too. However the concept is also own from `flex` files, where you define the longest patterns on top of the definitions and so on. – hek2mgl Apr 13 '15 at 16:26
@EdMorton: I just remember I used it that way once. Maybe `\b` would work well, too, but why would one write `\b` when they need `^$`? – choroba Apr 13 '15 at 16:29
@choroba you wouldn't but I was think the opposite - why would you add `^$` when you need `\b`? `^` and `$` are string boundaries, btw, not line boundaries. In many tools by default input strings (records in awk terminology) start/end on line boundaries so the terminology gets munged but in an RE `(^|\n)` actually represents start-of-line and `(\n|$)` represents end-of-line assuming `\n` line-endings. I expect that matters in sed when using that hold-space thingy. – Ed Morton Apr 13 '15 at 17:01
While *a* solution to the problem (and obviously helpful to the OP as he accepted it), I consider it problematic to have a question *titled* "sed: ..." and *tagged* "sed" answered by "this is how you do it in Perl". – DevSolar Oct 15 '19 at 07:40

Ed Morton · Answer 3 · 2015-04-13T16:08:19.093

2

This will work if your "words" don't contain RE metachars (. * ? etc.):

$ cat file
there is the problem when the foo is closed

$ cat tst.awk
BEGIN {
    split("the a foo bar",tmp)
    for (i=1;i in tmp;i+=2) {
        old = (i>1 ? old "|" : "\\<(") tmp[i]
        map[tmp[i]] = tmp[i+1]
    }
    old = old ")\\>"
}
{
    head = ""
    tail = $0
    while ( match(tail,old) ) {
        head = head substr(tail,1,RSTART-1) map[substr(tail,RSTART,RLENGTH)]
        tail = substr(tail,RSTART+RLENGTH)
    }
    print head tail
}

$ awk -f tst.awk file
there is a problem when a bar is closed

The above obviously maps "the" to "a" and "foo" to "bar" and uses GNU awk for word boundaries.

If your "words" do contain RE metachars etc. then you need a string-based solution using index() instead of an RE based one using match() (note that sed ONLY supports REs, not strings).

edited Apr 13 '15 at 16:08

answered Apr 13 '15 at 15:52

Ed Morton

188,023
17
78
185

1

Funny example! :) To make it work even if the words contain metacharacters one could pre-process the search words and escape meta-characters. – hek2mgl Apr 13 '15 at 16:12
@hek2mgl no, you cannot do that. That's the usual solution touted by sed folks trying to make sed work on strings but it cannot be done as trying to do that can end up introducing syntax errors or converting `t` into a tab, etc. It ends up just a mess. Just use string functions for string operations - the only downside is having to then identify word boundaries.. – Ed Morton Apr 13 '15 at 16:15
Is it proven that it is not possible to escape a string which contains metacharacters reliably with `sed`? I'm just asking. Using `awk`'s `index()` looks like the better alternative anyway. – hek2mgl Apr 13 '15 at 16:21
@hek2mgl I haven't seen a proof that it's impossible, but every time someone has posted an approach to doing it I or someone else has come up with some input string that breaks that approach. – Ed Morton Apr 13 '15 at 16:30
Ah, ok. If it is not proven already I'll play around a bit (and likely fail) – hek2mgl Apr 13 '15 at 16:33
1

There's also the "why bother?" argument since tools that operate on strings do exist :-). Something to consider is that this usually comes up in the context of `sed 's/search/replace/'` since sed has no ability to handle strings, so think about not just what you need to escape in the search position (delimiters and RE metachars) but also in the replacement position (delimiters and capture group expansions `&`, `\`). The simplest case might be to solve just the search for awk `match()` since it doesn't care about delimiters and you can use substr() to replace the matching string as-is. – Ed Morton Apr 13 '15 at 16:51
@hek2mgl IMHO it'd be well worth you posting a new question like "is there a case where escaping metachars doesn't work?" and posting your attempt there so we can all chime in and there's one reference spot for whatever the outcome is. – Ed Morton Apr 13 '15 at 16:55
1

I did not asked a question since years! :) Ok! Will prepare it. (First I need an attempt). I hope I don't get a rain of downvotes! hihi – hek2mgl Apr 13 '15 at 16:57
Seems this has been answered already: http://unix.stackexchange.com/questions/32355/escaping-of-meta-characters-in-basic-extended-posix-regex-strings-in-grep – hek2mgl Apr 13 '15 at 17:18
The accepted "answer" posted there would fail if the `raw_string` contained a `/`. I'm sure there's other cases too, let me think about it a bit... – Ed Morton Apr 13 '15 at 18:27
It won't handle RE intervals at all, e.g. `{3}` would be passed through untouched. You'd also have to be careful when calling sed to be sure to use `-r` or `-E` to invoke EREs or the posted solution will convert `+` to `\+` which changes a literal `+`INTO an RE metachar in BREs in some seds. – Ed Morton Apr 13 '15 at 18:38
I should have had a closer look there! – hek2mgl Apr 13 '15 at 18:45
1

It did get me thinking though and I think my biggest reason for nay-saying this type of approach is that it's so context sensitive. The characters you need to escape are so dependent on the tool you are using, the options you are giving it, whether you are using BREs or EREs or something else, etc. I think if any approach is going to succeed it'd be putting every [RE meta] char inside `[]` so that for example `+` becomes `[+]` which is ALWAYS a literal char, instead of `\+` which is sometimes an RE metachar but I have a feeling that will have drawbacks too. All I want is a **string** :-). – Ed Morton Apr 13 '15 at 18:51
Using `[+]` sounds definitely better than what I'm trying at the moment.. Let me open the question. It is probably too broad, but I think it does even not make sense to show what I have tried (It will be still imperfect) – hek2mgl Apr 13 '15 at 19:02
http://stackoverflow.com/questions/29613304/is-it-possible-to-escape-regex-characters-reliably-with-sed :) – hek2mgl Apr 13 '15 at 19:17
1

@hek2mgl Interesting - I came back to this question intending to re-work my answer in light of the discussion at your recent question and found I STILL couldn't use sed in this case because of the need to continually move down the input line so if mappings `a->b` and `b->c` were defined I wouldn't change `ab` to `bb` and then `cc`, but would instead end up with the desired `bc`. – Ed Morton Apr 15 '15 at 21:33
hard to get! let me check this tomorrow. Looks like I need to play around to understand that. – hek2mgl Apr 15 '15 at 22:22
@hek2mgl I also just re-discovered you can't use word boundaries around a string that isn't bounded by word constituent characters, e.g. `/\/` matches on `foo` but `/\<[.][*]\>/` will not match on `.*` so now I'm scratching my head a little about what I'm actually trying to accomplish here! I was hoping we'd end up with a solution that will let you replace ANY "word" (where I had a vague idea of a word being anything between word delimiters) by escaping all the RE metachars and then using word-delimiters but that's just not possible. Need to think about it a lot more! – Ed Morton Apr 15 '15 at 22:26
You are indeed right. `sed 's/a/b/;s/b/c/' <<< 'ab'` delivers `cb` which is likely not the desired result. I should have taken this into account. If the example is simple as `ab` this is more simple to see! Let me get your second comment. – hek2mgl Apr 16 '15 at 21:31
1

^^^ Probably tomorrow. Got an invitation to drink a beer... :D – hek2mgl Apr 16 '15 at 21:36

score 0 · Answer 4 · answered Jan 27 '22 at 16:41

replace with callback function in javascript

similar to the perl solution by choroba

var i = 'abcd'
var r = {ab: "cd", cd: "ab"}

var o = i.replace(/ab|cd/g, (...args) => r[args[0]])

o == 'cdab'

can be optimized with capture groups like /(ab)|(cd)/g and checking args[i] for undefined values

score 0 · Answer 5 · answered Jun 06 '23 at 04:57

Using Raku (formerly known as Perl_6)

Adapting @Choroba's elegant (first) Perl answer, below expressed in Raku:

~$ raku -pe 'my %h = (a => 1, b => 2, c => 3); s:g/ (a|b|c) /%h{"$0"}/ ;'  file

#OR

~$ raku -pe 'my %h = (a => 1, b => 2, c => 3); s:g[ (a|b|c) ] = "%h{$0}" ;'  file

It should be noted that in Raku, the single | alternation-pipe denotes "Longest-Token-Matching" alternation. If you want the Perl(5) behavior ("first-listed is first-substituted, etc.") in Raku, you use the double || alternation-pipe.

In contrast, if you'd rather go the sequential-substitution using Raku, examples can be found at Concatenating `s///` in raku.

References:
https://docs.raku.org/language/5to6-nutshell#Longest_token_matching_(LTM)_displaces_alternation
https://docs.raku.org/language/regexes#Longest_alternation:_|
https://raku.org

Replace multiple patterns, but not with the same string

5 Answers5

Linked

Related