1

I'd like to alphabetically lines between 2 patterns in a Bash shell script.

Given the following input file:

aaa
bbb
PATTERN1
foo
bar
baz
qux
PATTERN2
ccc
ddd

I expect as output:

aaa
bbb
PATTERN1
bar
baz
foo
qux
PATTERN2
ccc
ddd

Preferred tool is an AWK "one-liner". Sed and other solutions also accepted. It would be nice if an explanation is included.

Alex Harvey
  • 14,494
  • 5
  • 61
  • 97
mike
  • 275
  • 1
  • 12
  • 1
    It's not duplicate, method is different - earlier solution is not applicable for this scenario. – mike Nov 27 '15 at 13:20

8 Answers8

10

This is a perfect case to use asort() to sort an array in GNU awk:

gawk '/PATTERN1/ {f=1; delete a}
      /PATTERN2/ {f=0; n=asort(a); for (i=1;i<=n;i++) print a[i]}
      !f
      f{a[$0]=$0}' file

This uses a similar logic as How to select lines between two marker patterns which may occur multiple times with awk/sed with the addition that it:

  • Prints lines outside this range
  • Stores lines within this range
  • And when the range is over, sorts and prints them.

Detailed explanation:

  • /PATTERN1/ {f=1; delete a} when finding a line matching PATTERN1, sets a flag on, and clears the array of lines.
  • /PATTERN2/ {f=0; n=asort(a); for (i=1;i<=n;i++) print a[i]} when finding a line matching PATTERN2, sets the flag off. Also, sorts the array a[] containing all the lines in the range and print them.
  • !f if the flag is off (that is, outside the range), evaluate as True so that the line is printed.
  • f{a[$0]=$0} if the flag is on, store the line in the array a[] so that its info can be used later on.

Test

▶ gawk '/PATTERN1/ {f=1} /PATTERN2/ {f=0; n=asort(a); for (i=1;i<=n;i++) print a[i]} !f; f{a[$0]=$0}' FILE
aaa
bbb
PATTERN1
bar
baz
foo
qux
PATTERN2
ccc
ddd
Community
  • 1
  • 1
fedorqui
  • 275,237
  • 103
  • 548
  • 598
  • 1
    Note that asort is a function of **gawk**, not awk. – ghoti Nov 26 '15 at 23:06
  • 1
    @ghoti yep, I linked the GNU awk documentation. However, I just edited to make it more explicit. Thanks! – fedorqui Nov 26 '15 at 23:07
  • @fedorqui I wonder why did U use a[$0]=$0 instead of for example a[NR]=$0 ? It does not matter at all? Just wonder what values get a[?] in array :) if U use $0? – mike Nov 27 '15 at 14:21
  • @mike you are right, `a[$0]` is not necessary since the indexes are also reset by `asort()`. So `a[NR]` is perfectly fine. – fedorqui Nov 27 '15 at 14:31
  • @fedorqui is this possible to sort by 7th column? I tried adding in for (i=1;i<=n;i++) print a[i] | "sort -k7" } it sorts that well but output of sorted lines is after all printing below "123 456". – mike Nov 28 '15 at 21:46
  • @mike no, [`asort()`](https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html#String-Functions) does not support this. You may want to use Glenn Jackman's answer for this, since he uses the external `sort` command. – fedorqui Nov 29 '15 at 12:20
4

You can use sed with head and tail:

{
    sed '1,/^PATTERN1$/!d' FILE
    sed '/^PATTERN1$/,/^PATTERN2$/!d' FILE | head -n-1 | tail -n+2 | sort
    sed '/^PATTERN2$/,$!d' FILE
} > output

The first line prints everything from the 1st line to PATTERN1.

The second line takes the lines between PATTERN1 and PATTERN2, removes the last and first line, and sorts the remaining lines.

The third line prints everything from PATTERN2 to the end of the file.

Alex Harvey
  • 14,494
  • 5
  • 61
  • 97
choroba
  • 231,213
  • 25
  • 204
  • 289
3

More complicated, but may ease the memory load of storing lots of lines (your cfg file would have to be pretty huge for this to matter, but nevertheless...). Using GNU awk and a sort coprocess:

gawk -v p=1 '
    /^PATTERN2/ {          # when we we see the 2nd marker:

        # close the "write" end of the pipe to sort. Then sort will know it
        # has all the data and it can begin sorting
        close("sort", "to");

        # then sort will print out the sorted results, so read and print that
        while (("sort" |& getline line) >0) print line 

        # and turn the boolean back to true
        p=1
    }
    p  {print}             # if p is true, print the line
    !p {print |& "sort"}   # if p is false, send the line to `sort`
    /^PATTERN1/ {p=0}      # when we see the first marker, turn off printing
' FILE
Alex Harvey
  • 14,494
  • 5
  • 61
  • 97
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • tell me if I want to modify this to sort by for example 7th column what should I do? Simple modifying "sort -k7" is not working. Ok seems I have to modify every instance of sort :) Now it works. – mike Nov 30 '15 at 08:21
  • You could pass `-v cmd="sort -k7"` and then replace all `"sort"` with `cmd` – glenn jackman Nov 30 '15 at 11:31
2

It's a little unconventional but using Vim:

vim -c 'exe "normal /PATTERN1\<cr>jV/PATTERN2\<cr>k: ! sort\<cr>" | wq!' FILE

Where \<cr> is a carriage return, entered as CTRL-v then CTRL-M.

Further explanation:

  • Using vim normal mode,
  • /PATTERN1\<cr> - search for the first pattern
  • j - go to the next line
  • V - enter visual mode
  • /PATTERN2\<cr> - search for the second pattern
  • k - go back one line
  • : ! sort\<cr> - sort the visual text you just selected
  • wq! - save and exit
Alex Harvey
  • 14,494
  • 5
  • 61
  • 97
1

Obviously this is inferior to the GNU AWK solution, but all the same, this is a GNU sed solution:

sed '
/PATTERN1/,/PATTERN2/ {
  /PATTERN1/b    # branch/break if /PATTERN1/. This line is printed
  /PATTERN2/ {   # if /PATTERN2/,
    x                    # swap hold and pattern spaces
    s/^\n//              # delete the leading newline. The first H puts it there
    s/.*/sort <<< "&"/e  # sort the pattern space by calling Unix sort
    p                    # print the sorted pattern space
    x                    # swap hold and pattern space again to retrieve PATTERN2
    p                    # print it also
  }
  H   # Append the pattern space to the hold space
  d   # delete this line for now - it will be printed in the block above
}
' FILE

Note that I rely on the e command, a GNU extension.

Testing:

▶ gsed '
/PATTERN1/,/PATTERN2/ {
  /PATTERN1/b
  /PATTERN2/ {
    x
    s/^\n//; s/.*/sort <<< "&"/ep
    x
    p
  }
  H
  d
}
' FILE
aaa
bbb
PATTERN1
bar
baz
foo
qux
PATTERN2
ccc
ddd
Alex Harvey
  • 14,494
  • 5
  • 61
  • 97
1

Here is a small and easy to understand shell script for sorting lines between two patterns:

#!/bin/sh


in_file=$1
out_file=$2

temp_file_for_sort="$out_file.temp.for_sort"
curr_state=0
in_between_count=0

rm -rf $out_file

while IFS='' read -r line; do

if (( $curr_state == 0 )); then
    #write this line to output
    echo $line >> $out_file 
    is_start_line=`echo $line | grep "^PATTERN_START$"`
    if [ -z "$is_start_line" ]; then
        continue
    else
        rm -rf $temp_file_for_sort
        in_between_count=0
        curr_state=1
    fi
else 
    is_end_line=`echo $line | grep "^PATTERN_END"`
    if [ -z "$is_end_line" ]; then  
        #Line inside block - to be sorted
        echo $line >> $temp_file_for_sort
        in_between_count=$(( $in_between_count +1 ))
    else
        #End of block
        curr_state=0

        if (( $in_between_count != 0 )); then
            sort -o $temp_file_for_sort $temp_file_for_sort
            cat $temp_file_for_sort >> $out_file
            rm -rf $temp_file_for_sort
        fi
        echo $line >> $out_file 
    fi
fi

done < $temp_file

#if something remains
if [ -f $temp_file_for_sort ]; then
    cat $temp_file_for_sort >> $out_file
fi
rm -rf $temp_file_for_sort

Usage: <script_path> <input_file> <output_file>.

Pattern is hardcoded in file, can be changed as required (or taken as argument). Also, it creates a temporary file to sort intermediate data (<output_file>.temp.for_sort)

Algorithm:

Start with state = 0 and read the file line by line.

In state 0, line is written to output file and if START_PATTERN is encountered, state is set to 1.

In state 1, if line is not STOP_PATTERN, write line to temporary file In state 1, if line is STOP_PATTERN, sort temporary file, append temporary file contents to output file (and remove temporary file) and write STOP_PATTERN to output file. Also, change state to 0.

At last if something is left in temporary file (case when STOP_PATTERN is missing), write contents of temporary file to output file

0

Along the lines of the solution proposed by @choroba, using GNU sed (depends on Q command):

{
  sed -n '1,/PATTERN1/p' FILE
  sed   '1,/PATTERN1/d; /PATTERN2/Q' FILE | sort
  sed -n '/PATTERN2/,$p' FILE
}

Explanation:

  • Use of the p prints a line in the range 1 to /PATTERN1/ inclusive and ($ is end of file) in '1,/PATTERN1/p' and /PATTERN2/,$p respectively.
  • Use of -n disables default behaviour of printing all lines. Useful in conjunction with p.
  • In the middle line, the d command is used to delete lines 1 to the /PATTERN1/ and also to Q (quit without printing, GNU sed only) on the first line matching /PATTERN2/. These are the lines to be sorted, and are thus fed into sort.
Alex Harvey
  • 14,494
  • 5
  • 61
  • 97
0

This can also be done with non-GNU awk and system command sort, make it work on both macOS and Linux.

awk -v SP='PATTERN1' -v EP='PATTERN2' -v cmd=sort '{
if (match($0, SP)>0) {flag=1}
else if (match($0, EP)>0) {
   for (j=0;j<length(a);j++) {print a[j]|cmd}
   close(cmd); delete a; i=0; flag=0}
else if (flag==1) {a[i++]=$0; next}
print $0
}' FILE

Output:

aaa
bbb
PATTERN1
bar
baz
foo
qux
PATTERN2
ccc
ddd
alex
  • 799
  • 7
  • 8