0

It doesn't work in shell script, but in javascript, it works fine, what's wrong with that?

I just want to filter out all IPs that are not in the range of 2~254, for example, ignore the 255

shell script

I want to filter out the IP 192.168.18.255 only using the standard regular expressions.

  1. current useful input IPs: 192.168.18.195 192.168.18.255

  2. wanted output IP: 192.168.18.195

#!/usr/bin/env bash

IPs=$(ifconfig | grep -oE '(192\.168\.1?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.(1?[0-9][0-9]?|2[0-4][0-9]|25[0-5])')
echo $IPs
# 192.168.18.195 192.168.18.255 ❌

IPs1=$(ifconfig | grep -oE '192\.168\.(1?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.([2-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-4])')
echo $IPs1
# 192.168.18.195 192.168.18.25 ❌

IPs2=$(ifconfig | grep -oE '192\.168\.(1?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.([2-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-4])$')
echo $IPs2
# ❌

# wanted, ignore 192.168.18.255 ❓
# IPs=$(ifconfig | grep -oE '❓')
# echo $IPs
# 192.168.18.195

enter image description here

This minimal reproducible version input just for testing.

en1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=6463<RXCSUM,TXCSUM,TSO4,TSO6,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
        ether a4:83:e7:91:62:79 
        inet6 fe80::1ca2:3b0a:df9d:465f%en1 prefixlen 64 secured scopeid 0x7 
        inet 192.168.18.195 netmask 0xffffff00 broadcast 192.168.18.255
        inet6 fd80:eae6:1258:0:37:7544:1d1:7b08 prefixlen 64 autoconf secured 
        nd6 options=201<PERFORMNUD,DAD>
        media: autoselect
        status: active
$ ifconfig
# The output is too long, ignore it here, Please see the link below

The full version output of ifconfig link: https://gist.github.com/xgqfrms/a9e98b17835ddbffab07dde84bd3caa5

javascript

This is just used for test ignore the IP 255 works well.


function test(n) {
  let reg = /192\.168\.(1?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.([2-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-4])$/;
   for (let i = 0; i < n; i++) {
     let result = reg.test(`192.168.18.${i}`);
     if(result) {
       // console.log(`192.168.18.${i} ✅`, i, result)
     } else {
       console.log(`192.168.18.${i} ❌`, i, result)
     }
   }
}

test(256);

192.168.18.0 ❌ 0 false
192.168.18.1 ❌ 1 false
192.168.18.255 ❌ 255 false

      

enter image description here

refs

https://regexper.com/

https://regex101.com/

xgqfrms
  • 10,077
  • 1
  • 69
  • 68
  • What is your current output of `ifconfig` and what is your expected output? – anubhava Apr 26 '23 at 08:15
  • Your input in bash and js differs. First one is lines containing ip somewhere in it, second - only ip. Try replacing `$` in your third attempt in bash with `[[:space:]]` – markalex Apr 26 '23 at 08:22
  • @markalex It's not very clear, please show your full code, It's not valid `IPs2=$(ifconfig | grep -oE '192\.168\.(1?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.([2-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-4])[[: :]]')` – xgqfrms Apr 26 '23 at 08:37
  • 1
    Not `[[: :]]`, literally `[[:space:]]` – markalex Apr 26 '23 at 08:38
  • If your `ifconfig` output is too long to include in your question then it's too long for us to read. Please replace all the links and images with plain text that we can copy/paste to test with so we can help you. Keep in mind you're supposed to create and post a [mcve] that concisely demonstrates your problem, not just whatever data you have lying around. – Ed Morton Apr 27 '23 at 00:42
  • Because stackoverflow's markdown editor not support `
    `, so ignoring output that is too long will be easier for humans to read. And I had add a minimal reproducible example.
    – xgqfrms Apr 27 '23 at 05:17

2 Answers2

1

Since output of your ifconfig contains not of only IPs, but also of text surrounding it, $ is not the only valid marker of IP end.

Simplest way would be to replace it with \b: word boundary. But it will work only if you are using GNU grep.

If your grep lacks support of \b you can use ([[:space:]]|$) or [^0-9] instead. It will capture symbol following the IP if it is (whitespace or end-of-line) or non-digit correspondingly.

Example of command with output of your ifconfig stored in file input_file.txt:

$ ifconfig > input_file.txt

$ cat input_file.txt | grep -oE '192\.168\.(1?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.([2-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-4])([[:space:]]|$)'
192.168.18.195 

Demo at regex101.

xgqfrms
  • 10,077
  • 1
  • 69
  • 68
markalex
  • 8,623
  • 2
  • 7
  • 32
  • 1
    Probably avoid the [useless use of `cat`](https://stackoverflow.com/questions/11710552/useless-use-of-cat). A simpler fix is to use `grep -w` to avoid having matches cross word boundaries. Perhaps also explore `grep -F file` which lets you put the patterns in a file (maybe then also switch to using `grep -f`). – tripleee Apr 26 '23 at 09:23
  • @tripleee, cat here is just for demonstration purposes. OP uses grep to parse output of `ifconfig`, and I prefer not to change their command to ease understanding. Same goes for `-f`. – markalex Apr 26 '23 at 09:38
  • @tripleee, regarding `-w`: it feels like it would be better as a separate answer. – markalex Apr 26 '23 at 09:39
0

solutions

test environment result
macOS 13.1
Raspberry Pi OS x64
  1. \b

It's very brief and clear. word boundary

#!/usr/bin/env bash

IPs=$(ifconfig | grep -oE '192\.168\.(1?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.([2-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-4])\b')
echo $IPs
# 192.168.18.195

enter image description here

  1. [[:space:]]

This usage is not common, and not easy to understand.

It seems to be called character class keywords in GNU Linux.

#!/usr/bin/env bash

IPs=$(ifconfig | grep -oE '192\.168\.(1?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.([2-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-4])[[:space:]]')
echo $IPs
# 192.168.18.195

enter image description here

  1. grep -w

It only works with grep, but is not compatible with standard regular expressions.

#!/usr/bin/env bash

IPs=$(ifconfig | grep -woE '192\.168\.(1?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.([2-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-4])')
echo $IPs
# 192.168.18.195

enter image description here

$ man grep

# ...

     -w, --word-regexp
             The expression is searched for as a word (as if surrounded by
             ‘[[:<:]]’ and ‘[[:>:]]’; see re_format(7)).  This option has no
             effect if -x is also specified.
# ... 

refs

enter image description here

https://en.wikipedia.org/wiki/Regular_expression#Character_classes

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Assertions#:~:text=it%20in%20%22eat%22.-,%5Cb,-Matches%20a%20word

https://www.tutorialspoint.com/unix/unix-regular-expressions.htm

xgqfrms
  • 10,077
  • 1
  • 69
  • 68
  • `[[:space:]]` is not specific to Linux or GNU; it is a POSIX standard. The problem with `[[:space:]]` is that it only matches actual whitespace characters, not end of line; so it will find a word boundary only where that is a literal whitespace character, but not adjacent to beginning or end of line. It's hardly harder to understand than `\b`, but obviously means something else. (Some regex variants support `\s` instead of `[[:space:]]` but again, that's not portable or robust.) – tripleee Apr 27 '23 at 05:35
  • The problem with `\b` is that it is less portable. I will repeat my suggestion to prefer `-w` instead. – tripleee Apr 27 '23 at 05:36
  • While `grep -w` works, it's not what I want. – xgqfrms Apr 27 '23 at 12:29
  • Why not? Why do you say it's "not compatible with standard regular expressions"? Are you trying to say you want to express the boundary condition within the regex itself? – tripleee Apr 27 '23 at 12:59
  • You could use `(^|[^[:alnum:]_])` for a beginning word boundary and `([^[:alnum:]_]|$)` for an ending word boundary, but my experience is that some regex implementations don't properly support this either. – tripleee Apr 27 '23 at 13:00
  • I mean in the standard regex world, there is no `-w`. – xgqfrms Apr 27 '23 at 16:38
  • `\b` exists in most programming languages, such as, shell, javascript, python, java and so on. This's what I said common. – xgqfrms Apr 27 '23 at 16:46
  • But `[[:space:]]` only exists in the `POSIX` world, so it's not very common, as far as I'm concerned. I've never seen `[[:space:]]` before. – xgqfrms Apr 27 '23 at 16:50
  • That's balderdash; because it's in POSIX, it's supported on every even vaguely modern platform. Not having seen the POSIX classes in actual use doesn't mean they're not there. – tripleee Apr 28 '23 at 03:25