Regex delete lines after match

Question

I'm trying to match domain example.com and I would like to delete all IPs beneath it

Input:

[example.com]
10.100.251.1
10.100.251.2
10.100.251.3
[example.net]
10.100.251.22
10.100.251.33

Desired output:

[example.net]
10.100.251.22
10.100.251.33

Here is what I have tried so far:

\[example.com\](\s+^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$)*

It works, but not sure if thats efficient.

I'm doing my regex testing with rubular here is a sample

http://rubular.com/r/cavVHWPvT2

this doesn't seem like the job for a regex, what do you mean delete? — Ryan, Oct 30 '16 at 06:40
Why don't you: `try to put the second part into an array. Then, looping it and checking for containing in the first part. If matching, delete it`? — Tân, Oct 30 '16 at 06:50

score 1 · Answer 1 · edited May 23 '17 at 12:16

I wouldn't bother with a complex regex, I'd do it using Ruby's slice_before:

data = '[example.com]
10.100.251.1
10.100.251.2
10.100.251.3
[example.net]
10.100.251.22
10.100.251.33
'

data.lines.slice_before(/\A\[/).select { |ary| ary.first[/example\.net/] }.join
# => "[example.net]\n10.100.251.22\n10.100.251.33\n"

Breaking it down:

data
  .lines # => ["[example.com]\n", "10.100.251.1\n", "10.100.251.2\n", "10.100.251.3\n", "[example.net]\n", "10.100.251.22\n", "10.100.251.33\n"]
  .slice_before(/\A\[/) # => #<Enumerator: #<Enumerator::Generator:0x007f987b8b4528>:each>
  .select { |ary| ary.first[/example\.net/] } # => [["[example.net]\n", "10.100.251.22\n", "10.100.251.33\n"]]
  .join # => "[example.net]\n10.100.251.22\n10.100.251.33\n"

Regular expressions are great, and I use them when necessary but they're not always the best tool for a task. They can be very fragile and very treacherous, and greatly increase the task of maintaining code, especially as they get more complex.

This could also be accomplished using a flip-flop but explaining that is left to a different question: "What is a flip-flop operator?".

Tim Biegeleisen · Answer 2 · 2016-10-30T06:53:05.163

0

Try this:

Find:

\[example\.com\].*?(\[(?:(?!example\.com).)*?\])

Replace:

$1

Regex101

edited Oct 30 '16 at 06:53

answered Oct 30 '16 at 06:43

Tim Biegeleisen

502,043
27
286
360

First of all, update your question with the tool you are using. My regex would work in a tool such as Notepad++, but perhaps not yours. – Tim Biegeleisen Oct 30 '16 at 06:59
`.*` means match any character, zero or more times. `.*?` means match any character zero or more times, but it is a non greedy match. – Tim Biegeleisen Oct 30 '16 at 07:03
Explore this regex using the link provided. – Tim Biegeleisen Oct 30 '16 at 07:08

Cary Swoveland · Answer 3 · 2016-10-30T07:56:44.273

We are given

str =<<-END
[example.com]
10.100.251.1
10.100.251.2
10.100.251.3
[example.net]
10.100.251.22
10.100.251.33
END
  #=> "[example.com]\n10.100.251.1\n10.100.251.2\n10.100.251.3\n[example.net]\n10.100..."

The question is a bit confusing in that the desired output is said to be

[example.net]
10.100.251.22
10.100.251.33

but that is also what is to be deleted. What follows returns the lines that are not deleted, but it would be a simple matter to change it to return the deleted bits. Also, the question doesn't make clear if the string "[example.net]" is known or if it's just an example of what might follow the "[example.com]" "block". Nor is it clear if there are exactly two "blocks", as in the example, or there could be one or more than two blocks.

If you know "[example.net]" immediately follows the "[example.com]" block, you could write

r = /
    \[example\.com\]     # match string
    .*?                  # match any number of characters, lazily
    (?=\[example\.net\]) # match string in positive lookahead
    /mx                  # multiline and free-spacing modes

puts str[r]
[example.com]
10.100.251.1
10.100.251.2
10.100.251.3

If you don't know what follows the "[example.com]" "block", except that that the first line of the following block, if there is one, contains at least one character other than a digit or period, you could write

r = /
    \[example\.com\]\n  # match string
    .*?                 # match any number of any characters, lazily
    (?:[\d.]*\n)        # match a string containing > 0 digits and periods,
                        # followed by a newline, in a non-capture group
    +                   # match the above non-capture group > 0 times
    /x                  # free-spacing mode

puts str[r]
[example.com]
10.100.251.1
10.100.251.2
10.100.251.3

@TimBiegeleisen, there certainly are similarities, but differences too, as I'm returning the keepers and you're returning the removals. — Cary Swoveland, Oct 30 '16 at 07:55
Thanks @CarySwoveland and Tim, I'm having truble running the example in my temrinal, you think you can help me with a sample on http://rubular.com ? Thanks — Deano, Oct 30 '16 at 08:02
@CarySwoveland No, I am returning the keepers. Try it in Notepad++ and you will see. — Tim Biegeleisen, Oct 30 '16 at 08:46
@Tim, then why the capture group followed by `$1` for the non-keepers? Ah, because you interpreted those to be the keepers. Whose interpretation of the question is correct? I have no idea. The question is unclear. — Cary Swoveland, Oct 30 '16 at 22:34

score 0 · Answer 4 · answered Oct 30 '16 at 08:48

0

Your regex is very close. What you miss is a bit of grouping and a linebreak construct at the right place:

/^\[example\.com\]\R*(?:(?:\d{1,3}\.){3}\d{1,3}\R*)*/

See the Rubular demo

Details:

^ - start of line
\[example\.com\] - [example.com] literal substring
\R* - zero or more linebreaks (for older Ruby versions, use (?:\r?\n|\r)*)
(?:(?:\d{1,3}\.){3}\d{1,3}\R*)* - zero or more sequences of
- (?:\d{1,3}\.){3} - 3 sequences of 1 to 3 digits and a dot
- \d{1,3} - 1 to 3 digits
- \R* - 0+ linebreaks

And a Ruby demo:

str =<<DATA
[example.com]
10.100.251.1
10.100.251.2
10.100.251.3
[example.net]
10.100.251.22
10.100.251.33
DATA
rx = /^\[example\.com\]\R*(?:(?:\d{1,3}\.){3}\d{1,3}\R*)*/
puts str[rx]

answered Oct 30 '16 at 08:48

Wiktor Stribiżew

607,720
39
448
563

We end up with almost same regular expressions, but I still think `\s*` is better, than `\R*`. Either one claims the explicit precise format, then there should not be `*` matchers, or let’s allow spaces after IPs :) – Aleksei Matiushkin Oct 30 '16 at 08:55
`\s` matches horizontal whitespace, so `[example.com]78.78.89.67556.87.87.87` can also be matched. I understand they must be on the subsequent lines. – Wiktor Stribiżew Oct 30 '16 at 09:03

score 0 · Answer 5 · answered Oct 31 '16 at 03:11

Treat Your Data Like an INI File: Scan for Sections

One way to deal with your data is to treat it like an INI file. A regex with the multi-line option enabled can break a string representation of your INI file into an array of sections as follows:

ini = <<~'EOF'
  [example.com]
  10.100.251.1
  10.100.251.2
  10.100.251.3
  [example.net]
  10.100.251.22
  10.100.251.33
EOF

# Scan for INI section headers.
sections = ini.scan /^\[.*?\]$[^\[]*/m

You can then extract just the sections you want using Enumerable#grep. For example, to extract the example.net section:

section_title = 'example.net'
sections.grep /\A\[#{Regexp.escape section_title}\]\s*$/
#=> ["[example.net]\n10.100.251.22\n10.100.251.33\n"]

Caveats

The multi-line regex above assumes you have the entire file loaded as a single String object. If you're doing something else, you may need a different approach.
Note the importance of Regexp#escape, which ensures that your string is properly converted for use in a regex pattern. Otherwise, characters like [, ., and ] would not match as you might expect.
INI files can be more complex than your sample data. You might consider a writing a real INI parser, or using a gem like inifile, rather than trying to handle all the possible edge cases in one regular expression.

Regex delete lines after match

5 Answers5

Regex101

Treat Your Data Like an INI File: Scan for Sections

Caveats