2

What is a regex I can write in bash for parsing a line and extracting text that can be found between two | (so that would be ex: 1: |hey| 2: |boy|) and keeping those words in some sort of array?

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
syker
  • 10,912
  • 16
  • 56
  • 68
  • Is your example "ex: 1: |hey| 2: |boy|" a sample LINE to parse or the RESULTS of parsing a line? If the latter, what is a sample line that would produce those results? I can think of a number of approaches but they depend on what your input looks like, and which approach is "best" depends on what you do next with the "array". – Stephen P Apr 08 '10 at 22:01
  • the example is a sample LINE. in fact the example can be on new lines. – syker Apr 08 '10 at 22:02
  • what i want to do with the array is to just print it out in a special formatted order (like say commas in between) and sort it as well – syker Apr 08 '10 at 22:03

5 Answers5

2

no need complicated regular expression. Split on "|", then every 2nd element is what you want

#!/bin/bash
declare -a array
s="|hey| 2: |boy|"
IFS="|"
set -- $s
array=($@)
for((i=1;i<=${#array[@]};i+=2))
do
 echo ${array[$i]}
done

output

$ ./shell.sh
hey
boy

using awk

$ echo s="|hey| 2: |boy|" |  awk -F"|" '{for(i=2;i<=NF;i+=2)print $i}'
hey
boy
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • +1 Nice use of IFS, set and (). But, this approach won't work if the left and right delimiters differ (say, '<' and '>') and the order is meaningful, or the delimiter were multi-character (say, "--"). A regex approach is more general/flexible, IMHO. – Kevin Little Apr 09 '10 at 03:59
  • to make it more flexible is not difficult either. until that is required by OP, it will be left as it is. – ghostdog74 Apr 09 '10 at 04:17
1
$ foundall=$(echo '1: |hey| 2: |boy|' | sed -e 's/[^|]*|\([^|]\+\)|/\1 /g')
$ echo $foundall
hey boy
$ for each in ${foundall}
> do
>  echo ${each}
> done
hey
boy
Stephen P
  • 14,422
  • 2
  • 43
  • 67
0

Use sed -e 's,.*|\(.*\)|.*,\1,'

Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
syker
  • 10,912
  • 16
  • 56
  • 68
0

In your own answer, you output what's between the last pair of pipes (assuming there are more than two pipes on a line).

This will output what's between the first pair:

sed -e 's,[^|]*|\([^|]*\)|.*,\1,'

This will output what's between the outermost pair (so it will show pipes that appear between them):

sed -e 's,[^|]*|\(.*\)|.*,\1,'
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
0
#!/bin/bash

_str="ex: 1: |hey| 2: |boy|"
_re='(\|[^|]*\|)(.*)'  # in group 1 collect 1st occurrence of '|stuff|';
                       # in group 2 collect remainder of line. 

while [[ -n $_str ]];do
   [[ $_str =~ $_re ]]
   [[ -n ${BASH_REMATCH[1]} ]] && echo "Next token is '${BASH_REMATCH[1]}'"
   _str=${BASH_REMATCH[2]}
done

yields

Next token is '|hey|'
Next token is '|boy|'
Kevin Little
  • 12,436
  • 5
  • 39
  • 47