2

I have several Apache vHost configurations across several hosts. I'm trying to write a Bash script that will iterate through each host and search the .conf file on each one, pulling out the first (only the first) <VirtualHost> block. I've tried writing a regex to match it, but it's just not working. Here's the code I've tried:

    #!/bin/bash
    egrep -o '(\<VirtualHost\>)(.*)(\<\/VirtualHost\>)' -m1

Since .* doesn't match newlines, I even tried this:

    #!/bin/bash
    egrep -o '(\<VirtualHost\>)(.*[\S]*)(\<\/VirtualHost\>)' -m1

I still get nothing. :-(

I don't understand what I'm doing wrong here. Here is a sample of the data I'm trying to match:

    <VirtualHost apache-frontend:80>
            ServerAdmin     mysite@domain.com
            ServerName      domain.com
            DocumentRoot    /path/to/my/doc/root

            RewriteEngine   On
            Include         include.d/global/rewrite.conf
            RewriteRule     ^(.*)$ http://www.domain.com$1 [R=301,L]
    </VirtualHost>

    <VirtualHost apache-frontend:80>
            ServerAdmin     mysite@domain.com
            ServerName      domain.com
            DocumentRoot    /path/to/my/doc/root

            RewriteEngine   On
            Include         include.d/global/rewrite.conf
            RewriteRule     ^(.*)$ http://www.domain.com$1 [R=301,L]
    </VirtualHost>

    <VirtualHost apache-frontend:80>
            ServerAdmin     mysite@domain.com
            ServerName      domain.com
            DocumentRoot    /path/to/my/doc/root

            RewriteEngine   On
            Include         include.d/global/rewrite.conf
            RewriteRule     ^(.*)$ http://www.domain.com$1 [R=301,L]
    </VirtualHost>
Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
misteralexander
  • 448
  • 2
  • 7
  • 19
  • 1
    `grep` works line-wise. It doesn't match multi-line content. – Etan Reisner Mar 11 '16 at 01:38
  • This is not a bash question: `egrep` works exactly the same way no matter what shell it's invoked from, or if it's invoked without any shell at all. If you want a bash script that will do what you're asking for regardless of which tools it's using (ie. potentially using awk or native shell logic rather than egrep), then the question could probably stand some modification. – Charles Duffy Mar 11 '16 at 01:48
  • As @EtanReisner mentioned grep, egrep work likewise but also there is inconsistency of implementation across flavor of *nix platform. There is no guarantee that PCRE compatibility is implemented by grep or egrep which is key to work on multiline. I'm suggesting to write a script in any language such as python, perl etc to overcome these issues. see my post below. – Saleem Mar 11 '16 at 03:26

6 Answers6

2

this oneliner pulls only the first VirtualHost block from a config file:

awk '/<VirtualHost/,/<\/VirtualHost>/{print $0} /<\/VirtualHost>/{exit}' < vhostconf
user2021201
  • 370
  • 3
  • 10
1

Actually you could use -B option to print the context of the matching line, like this:

grep -E '</VirtualHost>' -m1 -B8 *yours.conf*
cifer
  • 615
  • 1
  • 9
  • 25
  • It's worth noting that `grep -P` is a GNUism, not guaranteed to be available on all platforms where bash is supported. (Actually, it's not guaranteed to be present on all GNU platforms either; whether it's compiled into GNU grep is a matter of build-time configuration). – Charles Duffy Mar 11 '16 at 01:50
  • @CharlesDuffy yeah, thanks for reminding, because I always prefer the perl regex grammar, so I just comfortable my habits... Actually the ERE grammar can handle this too, so I have updated my answer to use the `-E` option – cifer Mar 11 '16 at 02:24
1

With GNU sed:

$ sed -n '/<VirtualHost/,/<\/VirtualHost>/{p;/<\/VirtualHost>/q}' infile
    <VirtualHost apache-frontend:80>
            ServerAdmin     mysite@domain.com
            ServerName      domain.com
            DocumentRoot    /path/to/my/doc/root

            RewriteEngine   On
            Include         include.d/global/rewrite.conf
            RewriteRule     ^(.*)$ http://www.domain.com$1 [R=301,L]
    </VirtualHost>
  • -n prevents printing
  • /<VirtualHost/,/<\/VirtualHost>/ is an address range
  • For each line in the range, do {p;/<\/VirtualHost>/q}:
    • Print the line
    • If the line matches <\/VirtualHost>, i.e., is the last line of the block we want, then quit

To run this with BSD sed, add one more semicolon:

sed -n '/<VirtualHost/,/<\/VirtualHost>/{p;/<\/VirtualHost>/q;}'
Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
0

There is no guarantee that every platform have a PCRE compatible grep available. You can write a custom script which is guarantee to work on anywhere where python is available.

import re, sys

rx = '(?<=\<VirtualHost).*?\r?\n(.*?)(?=</VirtualHost>)'

data = ''.join(sys.stdin.readlines())


match = re.search(rx, data, re.DOTALL)
if match:
    print(match.group(1))

You can use it as

cat  your_vhost_file | python search.py

Where search.py is python file containing script posted above. After execution of script, you'll have content of first block as:

        ServerAdmin     mysite@domain.com
        ServerName      domain.com
        DocumentRoot    /path/to/my/doc/root

        RewriteEngine   On
        Include         include.d/global/rewrite.conf
        RewriteRule     ^(.*)$ http://www.domain.com$1 [R=301,L]

Note: This script can be easily adopted to list all matched sections in file.

Saleem
  • 8,728
  • 2
  • 20
  • 34
0

Use Perl

Perl is part of the Linux Standard Base, and is also standard on OS X, so it should be highly available on most modern systems. Perl is great at multiline text tasks. For example:

$ perl -ne '
      if (/VirtualHost/ ... m!/VirtualHost!) {
          print unless /VirtualHost/;
          exit if m!/VirtualHost!;
      }' /tmp/corpus

This one-liner will:

  1. Loop over the input file until it finds a VirtualHost block.
  2. Print every line within that block, excluding the starting or ending block tags.
  3. Exit the script when it sees the end of a VirtualHost block, ensuring that it only shows the first block.

Given your corpus, this will correctly yield:

           ServerAdmin     mysite@domain.com
           ServerName      domain.com
           DocumentRoot    /path/to/my/doc/root

           RewriteEngine   On
           Include         include.d/global/rewrite.conf
           RewriteRule     ^(.*)$ http://www.domain.com$1 [R=301,L]
Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
0

It is possible with grep as seen here.

Example finding all lines matching in some html file:

grep -Pazo "(?s)<div\s+class=\"version\">.*?Version\s+[\.0-9]+"
scrat.squirrel
  • 3,607
  • 26
  • 31