1

I need to be able to find and replace sensitive data like IP addresses in log files so that I can send them to a vendor for technical support.

The trouble is that the log files also contain version numbers that look like ip addresses but with extra digits.

The regex I've got so far picks up IP addresses just fine:

(((25[0-5]){1,3}|(2[0-4]|(1\d|[1-9]|)\d)){1,3}\.?){4}

Trouble is that it also picks up things like version numbers so it also matches things like 1555.2655.3255.1594

I thought that using {1,3} would limit it to a max of 3 digits but it isn't working like that.

I'm using Powershell to parse the files and below is a mock up of the type of formatting I'm dealing with:

test 127.1.1.1test test 10.0.0.1 test test 172.28.69.77test test 15.26.32.159 test test 15.26.32.1594test test 1.26.3255.1594test test 1555.2655.3255.1594test 255.255.255.192 256.255.255.0 999.999.999.999

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
BaldFeegle
  • 33
  • 5
  • 2
    Does this answer your question? [Validating IPv4 addresses with regexp](https://stackoverflow.com/questions/5284147/validating-ipv4-addresses-with-regexp) – mmh4all Apr 13 '23 at 09:57
  • What tool/lang do you use? The IPs need to be extracted from some text, how are they separated from the text? (e.g. always by whitespace or at start/end... [try this one](https://regex101.com/r/1KhptF/2)). – bobble bubble Apr 13 '23 at 10:22
  • I've tried every variation on that page but none have worked for me. Part of the problem is that the addresses are buried on huge text files, and often are directly adjacent to text, so any regex that uses \b, $ or ^ doesn't work. – BaldFeegle Apr 13 '23 at 10:33
  • @BaldFeegle So [this regex](https://regex101.com/r/1KhptF/3) does not work for you? (It is not from the linked page). What regex flavor are you using and how are the IPs separated? Please edit your question and provide some sample-data, otherwise it's impossible to help. – bobble bubble Apr 13 '23 at 10:38
  • @bobblebubble I'm using powershell to run a find/replace across a folder full of log files. Below is a mock up of the type of formatting I'm dealing with. test 127.1.1.1test test 10.0.0.1 test test 172.28.69.77test test 15.26.32.159 test test 15.26.32.1594test test 1.26.3255.1594test test 1555.2655.3255.1594test 255.255.255.192 256.255.255.0 999.999.999.999 – BaldFeegle Apr 13 '23 at 10:40
  • @BaldFeegle [This one](https://regex101.com/r/Ev7EW0/1) might do the job. – bobble bubble Apr 13 '23 at 10:50
  • 1
    @bobblebubble That seems perfect. Thank you. How do I mark this comment as the answer? – BaldFeegle Apr 13 '23 at 10:59
  • @BaldFeegle I put it as an answer, glad that helped. Would be nice if you edit your question and mention tool (tag powershell) plus include the sample data that you provided in the comments. – bobble bubble Apr 13 '23 at 11:09

2 Answers2

1

This might work for you.

(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
kaiinge
  • 21
  • 4
  • This is the best I've seen so far. It'll match invalid IP addresses like 999.999.999.999 but for my purposes I don't think that will be a problem. Thanks. – BaldFeegle Apr 13 '23 at 10:35
  • 1
    Jeffrey Friedls book includes a thorough walk through on capturing IP addresses using regex. – kaiinge Apr 13 '23 at 11:03
1

If the IPs can even be adjacent to letters in some text, set boundaries with negative lookarounds. E.g. (?<![\d.]) for not preceded by a digit or dot and after it (?![\d.]) for not followed by.

(?<![\d.])(?:(?:[1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.){3}(?:[1-9]?\d|1\d\d|2[0-4]\d|25[0-5])(?![\d.])

See this demo at regex101


If the IPs are separated by whitespace, the pattern can be shortened to:

(?<!\S)(?:(?:[1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.?\b){4}(?!\S)

Another demo at regex101

The method here to make the pattern shorter is the use of an optional dot \.? and force it by use of a word boundary \b after each of the {4} repititions (even at the end). In the first scenario this technique can't be used because because e.g. in 1.2.3.4abc there is no word boundary after the IP.


The relevant part from the IP-pattern (?:[1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) to match 0-255 derives from a regex for range online generator (a bit shortened). There is one at Stack Overflow as well.

bobble bubble
  • 16,888
  • 3
  • 27
  • 46