40

I am trying to take PCRE regex and use it in SED, but I'm running into some issues. Please note that this question is representative of a bigger issue (how to convert PCRE regex to work with SED) so the question is not simply about the example below, but about how to use PCRE regex in SED regex as a whole.

This example is extracting an email address from a line, and replacing it with "[emailaddr]".

echo "My email is abc@example.com" | sed -e 's/[a-zA-Z0-9]+[@][a-zA-Z0-9]+[\.][A-Za-z]{2,4}/[emailaddr]/g'

I've tried the following replace regex:

([a-zA-Z0-9]+[@][a-zA-Z0-9]+[\.][A-Za-z]{2,4})
[a-zA-Z0-9]+[@][a-zA-Z0-9]+[\.][A-Za-z]{2,4}
([a-zA-Z0-9]+[@][a-zA-Z0-9]+[.][A-Za-z]{2,4})
[a-zA-Z0-9]+[@][a-zA-Z0-9]+[.][A-Za-z]{2,4}

I've tried changing the delimited of sed from s/find/replace/g to s|find|replace|g as outlined here (stack overflow: pcre regex to sed regex).

I am still not able to figure out how to use PCRE regex in SED, or how to convert PCRE regex to SED. Any help would be great.

Community
  • 1
  • 1
Sugitime
  • 1,818
  • 4
  • 23
  • 44
  • Note that `.` is not special in `[brackets]`, so you don't have to escape it: `[.]` is fine. Also, `@` is not special in regular expressions at all, so you don't need to put it in brackets (unless you like the way it looks) – glenn jackman Jul 18 '14 at 19:40
  • 8
    Just a tip, you know you can use perl in a very similar to sed (syntax wise) and of course it suppors PCRE `perl -pe 's/oldstring/newstring/'` – Tiago Lopo Jul 18 '14 at 22:28
  • -1 Your question is wrong and you never corrected it. Also, a program (not sed), has been written that supports PCRE and changes in text files, from the command line http://superuser.com/questions/339118/regex-replace-from-command-line – barlop Apr 11 '16 at 10:19
  • The regexes in the question do not use any feature of PCRE which is not in ERE. So they are just extended regular expressions (ERE). Many modern `sed` implementations support ERE - even the specification in the POSIX draft describes the option as `sed -E`. – pabouk - Ukraine stay strong Nov 04 '21 at 16:14

5 Answers5

44

Want PCRE (Perl Compatible Regular Expressions)? Why don't you use perl instead?

perl -pe 's/[a-zA-Z0-9]+[@][a-zA-Z0-9]+[\.][A-Za-z]{2,4}/[emailaddr]/g' \
    <<< "My email is abc@example.com"

Output:

My email is [emailaddr]

Write output to a file with tee:

perl -pe 's/[a-zA-Z0-9]+[@][a-zA-Z0-9]+[\.][A-Za-z]{2,4}/[emailaddr]/g' \
    <<< "My email is abc@example.com" | tee /path/to/file.txt > /dev/null
Rockallite
  • 16,437
  • 7
  • 54
  • 48
  • This is a really good point about using perl. This works great when we are not processing large amounts of data and speed is not important. Because sed is fast. Like really fast! – oᴉɹǝɥɔ Mar 11 '19 at 21:53
  • 8
    @cherio and you think Perl is slow? Have you done any measurements? You won't notice too much of a difference. Because Perl is fast. Like really fast. – Endrju Jun 26 '19 at 02:14
  • 1
    On a Core i7-2600 under Ubuntu/WSL2, using the "extended regex" `"s/to_date\(('.{19}'),'YYYY-MM-DD HH24:MI:SS'\)/CONVERT(DATETIME,\1,120)/g"` on a 100 MB file, perl substantially outperforms sed -E and runs approximately 10 times faster. – David Le Borgne Sep 27 '21 at 12:40
  • 1
    "Don't have Perl and want to use PCRE, why don't you just use Perl?" I'm sorry, but this is oen of the most classic non-answer I've seen in a while. – 1337user Mar 28 '23 at 13:51
38

Use the -r flag enabling the use of extended regular expressions. ( -E instead of -r on OS X )

echo "My email is abc@example.com" | sed -r 's/[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[A-Za-z]{2,4}/[emailaddr]/g'

Ideone Demo

hwnd
  • 69,796
  • 4
  • 95
  • 132
  • 1
    I get 'sed: illegal option -- r' – Sugitime Jul 18 '14 at 19:39
  • Nevermind, its just a weird Mac thing. Works on my linux box so im set. Thank you so much! – Sugitime Jul 18 '14 at 19:40
  • 3
    On the Mac, you want `-E` instead: https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/sed.1.html – glenn jackman Jul 18 '14 at 19:44
  • 4
    Note that `-E` works with GNU `sed` too (as an undocumented alias of `-r`). – mklement0 Jul 18 '14 at 19:47
  • Thats great to know. I thought my regex was wrong. I didnt even think that the regex wasnt even being parsed. – Sugitime Jul 18 '14 at 19:49
  • Are extended regular expressions (via the -r switch) the same thing as Perl Compatible Regular Expressions (PCRE, per the question)? – Alex Hall Sep 17 '15 at 23:42
  • 34
    -1 and @AlexHall no they are not the same thing, this answer is misleading(for not addressing PCRE in the title). . PCRE (last time I checked), is not supported in SED. The most SED supports is ERE(extended...) (which requires the sed -r), without -r it's just BRE(basic...). Sed doesn't support PCRE. Grep does(with -P), but not sed. The question is misleading too since while the title says PCRE, the regex only requires ERE – barlop Apr 11 '16 at 10:16
  • `-E` should be certainly preferred. Modern implementations of `sed` (including GNU sed) follow the current POSIX draft and support ERE using `sed -E`. As barlop has already said the question contains just ERE regexes no need for PCRE here. – pabouk - Ukraine stay strong Nov 04 '21 at 16:18
10

GNU sed uses basic regular expressions or, with the -r flag, extended regular expressions.

Your regex as a POSIX basic regex (thanks mklement0):

[[:alnum:]]\{1,\}@[[:alnum:]]\{1,\}\.[[:alpha:]]\{2,4\}

Note that this expression will not match all email addresses (not by a long shot).

glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • 3
    Since the OP also seems to use OS X: note that `sed` there (FreeBSD `sed`) uses _POSIX_ basic regexes, where `\+` and `\?` are _not_ supported - use `\{1,\}` and `\{0,1\}` instead. – mklement0 Jul 18 '14 at 19:53
  • Thats also good to know. I'm in a tough spot because my dev machine is OSX and prod machine is linux.... But I'll keep that in mind. Thanks – Sugitime Jul 18 '14 at 19:56
  • @Sugitime: If you find yourself having to use both GNU `sed` and FreeBSD `sed` regularly, here's a summary of the differences: http://stackoverflow.com/a/24276470/45375 – mklement0 Jul 18 '14 at 20:08
2

for multiline use the 0! perl -0pe 's/search/replace/gms' file

DataYoda
  • 771
  • 5
  • 18
0

Sometimes this might be helpful too as a work-around:

str=$(grep -Poh "pcre-pattern" file)
sed -i "s/$str/$something_else/" file

-o, --only-matching: Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

aderchox
  • 3,163
  • 2
  • 28
  • 37