Extract all Strings between two characters to array using Bash

Question

i searched ours but can't find a solution to extract all Strings between two characters to array using Bash.

I find

sed -n 's/.*\[\(.*\)\].*/\1/p'

But this only show me the last entry.

My String looks like:

var="[a1] [b1] [123] [Text text] [0x0]"

I want a Array like this:

arr[0]="a1"
arr[1]="b1"
arr[2]="123"
arr[3]="Text text"
arr[4]="0x0"

So i search for Stings between [ and ] and load it into an Array without [ and ].

Thank you for helping!

Since Stack Overflow hides the Close reason from you: *Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: [How to create a Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve).* — jww, Apr 19 '18 at 09:21
@jww: Given question post has description of the problem, includes coding attempt (using `sed`) and describes why it doesn't work (extracts only the last element). So close reason "lack of debugging" doesn't applicable here. It could be other close reasons, but I find this question good. — Tsyvarev, Apr 19 '18 at 13:00

chepner · Answer 1 · 2018-04-18T15:08:39.580

There's no simple way to do it. I would use a loop to extract them one at a time:

var="[a1] [b1] [123] [Text text] [0x0]"
regex='\[([^]]*)\](.*)'
while [[ $var =~ $regex ]]; do
  arr+=("${BASH_REMATCH[1]}")
  var=${BASH_REMATCH[2]}
done

In the regular expression, \[([^]]*)\] captures everything after the first [ up to (but not including) the next ]. (.*) captures everything after that for the next iteration.

You can use declare -n in bash 4.3 or later to make this look a little less intimidating.

declare -n m1=BASH_REMATCH[1] m2=BASH_REMATCH[2]
regex='\[([^]]*)\](.*)'

var="[a1] [b1] [123] [Text text] [0x0]"
while [[ $var =~ $regex ]]; do
  arr+=("$m1")
  var=$m2
done

karakfa · Answer 2 · 2018-04-18T15:15:22.283

0

$ IFS=, arr=($(sed 's/\] \[/","/g;s/\]/"/;s/\[/"/' <<< "$var")); echo "${arr[3]}"

"Text text"

edited Apr 18 '18 at 15:15

answered Apr 18 '18 at 14:51

karakfa

66,216
7
41
56

score -1 · Answer 3 · answered Apr 18 '18 at 15:36

-1

With GNU awk for multi-char RS and RT and newer versions of bash for mapfile:

$ mapfile -t arr < <(echo "$var" | awk -v RS='[^][]+' 'NR%2{print RT}')

$ declare -p arr
declare -a arr=([0]="a1" [1]="b1" [2]="123" [3]="Text text" [4]="0x0")

answered Apr 18 '18 at 15:36

Ed Morton

188,023
17
78
185

score -1 · Accepted Answer · answered Apr 18 '18 at 16:27

There are a lot of suggestions that may work for you here already, but may not depending on your data. For example, substituting your current field separator of ] [ for a comma works unless you have commas embedded in your fields. Which your sample data does not have, but one never knows. :)

An ideal solution would be to use something as a field separator that is guaranteed never to be part of your field, like a null. But that's hard to do in a portable way (i.e. without knowing what tools are available). So a less extreme stance might be to use a newline as a separator:

var="[a1] [b1] [123] [Text text] [0x0]"

mapfile -t arr < <(sed $'s/^\[//;s/] \[/\\\n/g;s/]$//' <<<"$var")

declare -p arr

which would result in:

declare -a arr='([0]="a1" [1]="b1" [2]="123" [3]="Text text" [4]="0x0")'

This is functionally equivalent to the awk solution that Inian provided. Note that mapfile requires bash version 4 or above.

That said, you could also this exclusively within bash, without relying on any external tools like sed:

arr=( $var )

last=0
for i in "${!arr[@]}"; do
  if [[ ${arr[$i]} != \[* ]]; then
    arr[$last]="${arr[$last]} ${arr[$i]}"
    unset arr[$i] 
    continue
  fi
  last=$i
done

for i in "${!arr[@]}"; do
  arr[$i]="${arr[$i]:1:$((${#arr[$i]}-2))}"
done

At this point, declare -p arr results in:

declare -a arr='([0]="a1" [1]="b1" [2]="123" [3]="Text text" [5]="0x0")'

This sucks your $var into the array $arr[] with fields separated by whitespace, then it collapses the fields based on whether they begin with a square bracket. It then goes through the fields and replaces them with the substring that eliminates the first and last character. It may be a little less resilient and harder to read, but it's all within bash. :)

Thank you all. I tested all and the other answers works great too. But i use your code " mapfile -t arr < <(sed $'s/^\[//;s/] \[/\\\n/g;s/]$//' <<<"$var") " because it is the shortest (don't know if it is the best). :-D — suf noK, Apr 19 '18 at 14:44

Extract all Strings between two characters to array using Bash

4 Answers4

Linked