How can I extract text from between two delimiters on a line in bash?

Question

What is a regex I can write in bash for parsing a line and extracting text that can be found between two | (so that would be ex: 1: |hey| 2: |boy|) and keeping those words in some sort of array?

Is your example "ex: 1: |hey| 2: |boy|" a sample LINE to parse or the RESULTS of parsing a line? If the latter, what is a sample line that would produce those results? I can think of a number of approaches but they depend on what your input looks like, and which approach is "best" depends on what you do next with the "array". — Stephen P, Apr 08 '10 at 22:01
the example is a sample LINE. in fact the example can be on new lines. — syker, Apr 08 '10 at 22:02
what i want to do with the array is to just print it out in a special formatted order (like say commas in between) and sort it as well — syker, Apr 08 '10 at 22:03

score 2 · Accepted Answer · answered Apr 09 '10 at 00:08

2

no need complicated regular expression. Split on "|", then every 2nd element is what you want

#!/bin/bash
declare -a array
s="|hey| 2: |boy|"
IFS="|"
set -- $s
array=($@)
for((i=1;i<=${#array[@]};i+=2))
do
 echo ${array[$i]}
done

output

$ ./shell.sh
hey
boy

using awk

$ echo s="|hey| 2: |boy|" |  awk -F"|" '{for(i=2;i<=NF;i+=2)print $i}'
hey
boy

answered Apr 09 '10 at 00:08

ghostdog74

327,991
56
259
343

+1 Nice use of IFS, set and (). But, this approach won't work if the left and right delimiters differ (say, '<' and '>') and the order is meaningful, or the delimiter were multi-character (say, "--"). A regex approach is more general/flexible, IMHO. – Kevin Little Apr 09 '10 at 03:59
to make it more flexible is not difficult either. until that is required by OP, it will be left as it is. – ghostdog74 Apr 09 '10 at 04:17

score 1 · Answer 2 · answered Apr 08 '10 at 22:33

1

$ foundall=$(echo '1: |hey| 2: |boy|' | sed -e 's/[^|]*|\([^|]\+\)|/\1 /g')
$ echo $foundall
hey boy
$ for each in ${foundall}
> do
>  echo ${each}
> done
hey
boy

answered Apr 08 '10 at 22:33

Stephen P

14,422
2
43
67

score 0 · Answer 3 · edited Apr 08 '10 at 22:40

0

Use sed -e 's,.*|\(.*\)|.*,\1,'

edited Apr 08 '10 at 22:40

Dennis Williamson

346,391
90
374
439

answered Apr 08 '10 at 22:21

syker

10,912
16
56
68

score 0 · Answer 4 · answered Apr 08 '10 at 22:45

In your own answer, you output what's between the last pair of pipes (assuming there are more than two pipes on a line).

This will output what's between the first pair:

sed -e 's,[^|]*|\([^|]*\)|.*,\1,'

This will output what's between the outermost pair (so it will show pipes that appear between them):

sed -e 's,[^|]*|\(.*\)|.*,\1,'

Kevin Little · Answer 5 · 2010-04-08T23:11:43.187

0

#!/bin/bash

_str="ex: 1: |hey| 2: |boy|"
_re='(\|[^|]*\|)(.*)'  # in group 1 collect 1st occurrence of '|stuff|';
                       # in group 2 collect remainder of line. 

while [[ -n $_str ]];do
   [[ $_str =~ $_re ]]
   [[ -n ${BASH_REMATCH[1]} ]] && echo "Next token is '${BASH_REMATCH[1]}'"
   _str=${BASH_REMATCH[2]}
done

yields

Next token is '|hey|'
Next token is '|boy|'

edited Apr 08 '10 at 23:11

answered Apr 08 '10 at 22:58

Kevin Little

12,436
5
39
47

How can I extract text from between two delimiters on a line in bash?

5 Answers5