bash 4: Generic access to substring (n) of string by arbitrary delimiter?

Question

Let's assume I have the following string: x="number 1;number 2;number 3".

Access to the first substring is successfull via ${x%%";"*}, access to the last substring is via ${x##*";"}:

$ x="number 1;number 2;number 3"
$ echo "front : ${x%%";"*}"  #front-most-part
number 1
$ echo "back  : ${x##*";"}"  #back-most-part
number 3
$

How do I access the middle part: (eg. number 2)?
Is there a better way to do this if I have (many...) more parts then just three?
In other words: Is there a generic way of accessing substring No. n of string yyy, delimited by string xxx where xxx is an arbitraty string/delimiter?

I have read How do I split a string on a delimiter in Bash?, but I specifically do not want to iterate over the string but rather directly access a given substring.

This specifically does not ask or a split into arrays, but into sub-strings.

Possible duplicate of [Split string into an array in Bash](http://stackoverflow.com/questions/10586153/split-string-into-an-array-in-bash) — Andreas Louv, Oct 27 '15 at 13:04
@Kalanidhi That _only_ accesses the second substring, in the end I'd like delimited access to all of them (this actually happens in a loop and generic access to any of the substrings is required). — Christian, Oct 27 '15 at 13:34
@dev-null No, I ask for a split into strings, not arrays (This actually happens within a loop over a multidimensional array, so I don't want a subarray of an array). — Christian, Oct 27 '15 at 13:35

score 4 · Accepted Answer · edited May 23 '17 at 12:03

With a fixed index:

x="number 1;number 2;number 3"

# Split input into fields by ';' and read the 2nd field into $f2
# Note the need for the *2nd* `unused`, otherwise f2 would 
# receive the 2nd field *plus the remainder of the line*.
IFS=';' read -r unused f2 unused <<<"$x"

echo "$f2"

Generically, using an array:

x="number 1;number 2;number 3"

# Split input int fields by ';' and read all resulting fields
# into an *array* (-a).
IFS=';' read -r -a fields <<<"$x"

# Access the desired field.
ndx=1
echo "${fields[ndx]}"

Constraints:

Using IFS, the special variable specifying the Internal Field Separator characters, invariably means:

Only single, literal characters can act as field separators.
- However, you can specify multiple characters, in which case any of them is treated as a separator.
The default separator characters are $' \t\n' - i.e., space, tab, and newline, and runs of them (multiple contiguious instances) are always considered a single separator; e.g., 'a b' has 2 fields - the multiple space count as a single separator.
By contrast, with any other character, characters in a run are considered separately, and thus separate empty fields; e.g., 'a;;b' has 3 fields - each ; is its own separator, so there's an empty field between ;;.

The read -r -a ... <<<... technique generally works well, as long as:

the input is single-line
you're not concerned about a trailing empty field getting discarded

If you need a fully generic, robust solution that addresses the issues above, use the following variation, which is explained in @gniourf_gniourf answer here:

sep=';' 
IFS="$sep" read -r -d '' -a fields < <(printf "%s${sep}\0" "$x")

Note the need to use -d '' to read multi-line input all at once, and the need to terminate the input with another separator instance to preserve a trailing empty field; the trailing \0 is needed to ensure that read's exit code is 0.

Quicker then Stackoverflow let's me accept it as the correct answer ;) Thanks! — Christian, Oct 27 '15 at 13:00
Please use this to break a string: `IFS=';' read -r -d '' -a fields < <(printf '%s;\0' "$x")`. See my answer in the linked question. — gniourf_gniourf, Oct 27 '15 at 13:18
@gniourf_gniourf: Good point, I've updated the answer (and ++ for your linked answer). — mklement0, Oct 27 '15 at 13:41

Andreas Louv · Answer 2 · 2015-10-27T22:37:45.380

0

Don't use:

~~Create an array with a delimiter of ;:~~

x="number 1;number 2;number 3"
_IFS=$IFS; IFS=';'
arr=($x)
IFS=$_IFS

echo ${arr[0]} # number 1
echo ${arr[1]} # number 2
echo ${arr[2]} # number 3

edited Oct 27 '15 at 22:37

answered Oct 27 '15 at 13:01

Andreas Louv

46,145
13
104
123

2

The is _the_ broken method to “split” a string on a delimiter (I should say, to introduce bugs) that is, unfortunately, wide spread, as it is subject pathname expansion (globbing). The “fix” is then to use `set -f` (but this doesn't fix _all_ the issues, as it will still concatenate multiple successive empty fields); and at this point you probably feel that you're fighting _against_ the shell: that's because you're using an _antipattern._ The linked question shows the canonical way to split a string: `IFS=\; read -r -d '' -a arr < <(printf '%s;\0' "$x")` (and it's only one line!). – gniourf_gniourf Oct 27 '15 at 13:14
When I say _it will still concatenate multiple successive empty field_ I mean when used with a space as `IFS` (which is not the case here). Though in this case it will remove the last field if it's empty. – gniourf_gniourf Oct 27 '15 at 13:17
@gniourf_gniourf: The globbing argument is valid (and having to set and restore _2_ configuration items (`IFS`, `set -f`) makes this solution ultimately cumbersome (though it may have a slight performance advantage compared to `read ... <<<...` and presumably more so compared to `read ... < <(printf ...)`). However, from what I can tell, there's no escaping the shell considering _runs_ of tabs, spaces, newlines a _single_ separator (concatenating multiple successive empty fields) - this logic is built into `IFS`, so it affects the array-literal syntax as well as `read`. – mklement0 Oct 27 '15 at 14:40
1

@mklement0 this happens when `IFS` is set to a space character, like a space or a tab or a newline (because Bash treats these chars in a special way). Try it with: `a='a b'` (with 2 spaces between `a` and `b`) and then `IFS=' '; ary=( $a )`. You'll see that the empty fields are discarded. While this actually looks like what we'd want in general, it can be surprising when trying to slurp the output of a command in an array using `IFS=$'\n'; a=( $(echo a; echo; echo b) )`. Here the empty line isn't preserved… whether this is good or bad doesn't really matter; just something to be aware of! – gniourf_gniourf Oct 27 '15 at 15:55
1

@gniourf_gniourf: Fully agreed, and my updated answer describes all that. My point was: this behavior is _inevitable_ with `IFS` and is not a shortcoming of _this_, the `($x)` approach - I now suspect you never meant to imply that, that you were only pointing out a _general_ limitation - so I think we're in full agreement. – mklement0 Oct 27 '15 at 16:05

bash 4: Generic access to substring (n) of string by arbitrary delimiter?

2 Answers2