1

I'm trying to match domain example.com and I would like to delete all IPs beneath it

Input:

[example.com]
10.100.251.1
10.100.251.2
10.100.251.3
[example.net]
10.100.251.22
10.100.251.33

Desired output:

[example.net]
10.100.251.22
10.100.251.33

Here is what I have tried so far:

\[example.com\](\s+^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$)*

It works, but not sure if thats efficient.

I'm doing my regex testing with rubular here is a sample

http://rubular.com/r/cavVHWPvT2

SLePort
  • 15,211
  • 3
  • 34
  • 44
Deano
  • 11,582
  • 18
  • 69
  • 119
  • 2
    this doesn't seem like the job for a regex, what do you mean delete? – Ryan Oct 30 '16 at 06:40
  • Well I would like to target these entries for deletion – Deano Oct 30 '16 at 06:41
  • Why don't you: `try to put the second part into an array. Then, looping it and checking for containing in the first part. If matching, delete it`? – Tân Oct 30 '16 at 06:50

5 Answers5

1

I wouldn't bother with a complex regex, I'd do it using Ruby's slice_before:

data = '[example.com]
10.100.251.1
10.100.251.2
10.100.251.3
[example.net]
10.100.251.22
10.100.251.33
'

data.lines.slice_before(/\A\[/).select { |ary| ary.first[/example\.net/] }.join
# => "[example.net]\n10.100.251.22\n10.100.251.33\n"

Breaking it down:

data
  .lines # => ["[example.com]\n", "10.100.251.1\n", "10.100.251.2\n", "10.100.251.3\n", "[example.net]\n", "10.100.251.22\n", "10.100.251.33\n"]
  .slice_before(/\A\[/) # => #<Enumerator: #<Enumerator::Generator:0x007f987b8b4528>:each>
  .select { |ary| ary.first[/example\.net/] } # => [["[example.net]\n", "10.100.251.22\n", "10.100.251.33\n"]]
  .join # => "[example.net]\n10.100.251.22\n10.100.251.33\n"

Regular expressions are great, and I use them when necessary but they're not always the best tool for a task. They can be very fragile and very treacherous, and greatly increase the task of maintaining code, especially as they get more complex.

This could also be accomplished using a flip-flop but explaining that is left to a different question: "What is a flip-flop operator?".

Community
  • 1
  • 1
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
0

Try this:

Find:

\[example\.com\].*?(\[(?:(?!example\.com).)*?\])

Replace:

$1

Regex101

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

We are given

str =<<-END
[example.com]
10.100.251.1
10.100.251.2
10.100.251.3
[example.net]
10.100.251.22
10.100.251.33
END
  #=> "[example.com]\n10.100.251.1\n10.100.251.2\n10.100.251.3\n[example.net]\n10.100..."

The question is a bit confusing in that the desired output is said to be

[example.net]
10.100.251.22
10.100.251.33

but that is also what is to be deleted. What follows returns the lines that are not deleted, but it would be a simple matter to change it to return the deleted bits. Also, the question doesn't make clear if the string "[example.net]" is known or if it's just an example of what might follow the "[example.com]" "block". Nor is it clear if there are exactly two "blocks", as in the example, or there could be one or more than two blocks.

If you know "[example.net]" immediately follows the "[example.com]" block, you could write

r = /
    \[example\.com\]     # match string
    .*?                  # match any number of characters, lazily
    (?=\[example\.net\]) # match string in positive lookahead
    /mx                  # multiline and free-spacing modes

puts str[r]
[example.com]
10.100.251.1
10.100.251.2
10.100.251.3

If you don't know what follows the "[example.com]" "block", except that that the first line of the following block, if there is one, contains at least one character other than a digit or period, you could write

r = /
    \[example\.com\]\n  # match string
    .*?                 # match any number of any characters, lazily
    (?:[\d.]*\n)        # match a string containing > 0 digits and periods,
                        # followed by a newline, in a non-capture group
    +                   # match the above non-capture group > 0 times
    /x                  # free-spacing mode

puts str[r]
[example.com]
10.100.251.1
10.100.251.2
10.100.251.3
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
0

Your regex is very close. What you miss is a bit of grouping and a linebreak construct at the right place:

/^\[example\.com\]\R*(?:(?:\d{1,3}\.){3}\d{1,3}\R*)*/

See the Rubular demo

Details:

  • ^ - start of line
  • \[example\.com\] - [example.com] literal substring
  • \R* - zero or more linebreaks (for older Ruby versions, use (?:\r?\n|\r)*)
  • (?:(?:\d{1,3}\.){3}\d{1,3}\R*)* - zero or more sequences of
    • (?:\d{1,3}\.){3} - 3 sequences of 1 to 3 digits and a dot
    • \d{1,3} - 1 to 3 digits
    • \R* - 0+ linebreaks

And a Ruby demo:

str =<<DATA
[example.com]
10.100.251.1
10.100.251.2
10.100.251.3
[example.net]
10.100.251.22
10.100.251.33
DATA
rx = /^\[example\.com\]\R*(?:(?:\d{1,3}\.){3}\d{1,3}\R*)*/
puts str[rx]
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • We end up with almost same regular expressions, but I still think `\s*` is better, than `\R*`. Either one claims the explicit precise format, then there should not be `*` matchers, or let’s allow spaces after IPs :) – Aleksei Matiushkin Oct 30 '16 at 08:55
  • `\s` matches horizontal whitespace, so `[example.com]78.78.89.67556.87.87.87` can also be matched. I understand they must be on the subsequent lines. – Wiktor Stribiżew Oct 30 '16 at 09:03
0

Treat Your Data Like an INI File: Scan for Sections

One way to deal with your data is to treat it like an INI file. A regex with the multi-line option enabled can break a string representation of your INI file into an array of sections as follows:

ini = <<~'EOF'
  [example.com]
  10.100.251.1
  10.100.251.2
  10.100.251.3
  [example.net]
  10.100.251.22
  10.100.251.33
EOF

# Scan for INI section headers.
sections = ini.scan /^\[.*?\]$[^\[]*/m

You can then extract just the sections you want using Enumerable#grep. For example, to extract the example.net section:

section_title = 'example.net'
sections.grep /\A\[#{Regexp.escape section_title}\]\s*$/
#=> ["[example.net]\n10.100.251.22\n10.100.251.33\n"]

Caveats

  1. The multi-line regex above assumes you have the entire file loaded as a single String object. If you're doing something else, you may need a different approach.
  2. Note the importance of Regexp#escape, which ensures that your string is properly converted for use in a regex pattern. Otherwise, characters like [, ., and ] would not match as you might expect.
  3. INI files can be more complex than your sample data. You might consider a writing a real INI parser, or using a gem like inifile, rather than trying to handle all the possible edge cases in one regular expression.
Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199