-2

I've been learning SED and GREP for couple of weeks now. Usually I use ATOM editor for building the regex and it has helped me alot. Now it doesn't take me more than few minutes to buid one.
But things start getting ugly when I try to use same regex against a data file using ubuntu terminal.

Could someone plz provide precise switches with grep and sed, also with limitations (like- GNU SED cannot use \d for digit, rather uses [0-9]).
Lets take below text and requirements for example:
192.168.10.10,fe80:0:0:0:bcf6:c04e:cb99:6909,10.0.170.11
172.16.32.44
fe80:0:0:0:84a5:1d2e:55d1:ecf,192.168.4.50
fe80:0:0:0:84a5:1d2e:55d1:ec1
10.10.101.22

After wrecking my head for hours I could figure out grep -P '(\d{1,3}\.){3}\d{1,3}' to print the only IPV4 addresses. But this is PERL regex switch. So now I am hell confused about what to use and what not.
Plz help me build full SED and GREP commands for below requirements (assuming input is a file):

1- Print only IPV4 addresses using GREP.
2- Print everything except IPV4 addresses using GREP.
3- Print only IPV4 addresses using SED.
4- Print everything except IPV4 addresses using SED.
5- Replace IPV4 addresses with --- using SED.
6- Replace everything except IPV4 addresses using SED.
user2593869
  • 123
  • 2
  • 10
  • `sed` is not optimal for this task. You could do something like `sed -nE 's/^/\t/; s/[^.0-9]/\t/g; s/$/\t/; s/\t[^.]*\t/\t/g; s/\t([0-9.]*)\t/\1\n/g;p' ur_file | sed -n '/[0-9]/p'` for ip4 addresses – dawg Feb 07 '21 at 16:43
  • What the point of doing the same task with `sed` and then with `grep`? There are no "magic switches" that replace sed with grep or grep with sed. `Replace everything except IPV4 addresses using SED` replace with what? – KamilCuk Feb 07 '21 at 16:45
  • I remember sed being used for printing, but not sure how to add regex in this- ```cat somefile | sed -rn '/(expression-here)/p'``` – user2593869 Feb 08 '21 at 11:27

1 Answers1

1

From question Validating IPv4 addresses with regexp:

ipv4='((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'

The regex looks fine to use as Extended Regular Expressions.

1- Print only IPV4 addresses using GREP.

grep -Eo "$ipv4"

2- Print everything except IPV4 addresses using GREP.

I believe it's not possible with to print "everything except" part of line of grep.

3- Print only IPV4 addresses using SED.

Write a sed script with proper regex to add a newline after every ipv4 address. Then replace each non-newline string followed by ipv4 followed by newline by only the ipv4 with the newline. Remove newlines from pattern space and print it. Something along:

sed -E "s/($ipv4)/&\n/g; s/$/\n/; s/([^\n]*)($ipv4)\n/---\2\n/g; s/\n\n/\n/; s/\n//g"

4- Print everything except IPV4 addresses using SED.

sed -E "s/$ipv4//g"

5- Replace IPV4 addresses with --- using SED.

sed -E "s/$ipv4/---/g"

6- Replace everything except IPV4 addresses using SED.

As point 3, but instead of replacing non-newline string followed by ipv4 followed by newline by the ipv4 and newline, remove the ipv4 and keep the non-newline part. Something along:

sed -E "s/($ipv4)/&\n/g; s/$/\n/; s/([^\n]*)($ipv4)\n/\1---\n/g; s/\n\n/\n/; s/\n//g"

The -E (or -r option)` is technically an extension to POSIX sed. I doubt you'll find an implementation without it - if you do, translate the regex into basic regular expression and it should work fine.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • thanks alot for taking out time and such elaborative answer. Just a naive question. for your 3rd answer, can we match the entire line using ```.*(regex).*``` and then select IPV4 with back-reference? e.g.- ```'s/.*(regex).*/\1/g```. I am not able to make it work. Would be thankful if you could spend some more time on this. – user2593869 Feb 08 '21 at 12:04
  • You could, but if there are two ipv4 addresses per line, the first one will be removed. `.*` is greedy - it matches everything, including the ipv4 address inside, if any. I guess you could do it with perl negative forward lookaround (I think that's what it's called), but plain regex has no lookarounds. That's why first chop the line so it's one ipv4 per line, then remove parts, then fold it back. – KamilCuk Feb 08 '21 at 12:05