29

Say I have a bash array (e.g. the array of all parameters) and want to delete all parameters matching a certain pattern or alternatively copy all remaining elements to a new array. Alternatively, the other way round, keep elements matching a pattern.

An example for illustration:

x=(preffoo bar foo prefbaz baz prefbar)

and I want to delete everything starting with pref in order to get

y=(bar foo baz)

(the order is not relevant)

What if I want the same thing for a list of words separated by whitespace?

x="preffoo bar foo prefbaz baz prefbar"

and again delete everything starting with pref in order to get

y="bar foo baz"
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
kynan
  • 13,235
  • 6
  • 79
  • 81

6 Answers6

29

Filtering an array is tricky if you consider possibility of elements containing spaces (not to mention even "weirder" characters). In particular answers given so far (referring to various forms of ${x[@]//pref*/}) will fail with such arrays.

I have investigated this issue somewhat and found a solution however it is not a nice one-liner. But at least it is.

For illustration examples let's assume ARR names the array we want to filter. We shall start with the core expression:

for index in "${!ARR[@]}" ; do [[ …condition… ]] && unset -v 'ARR[$index]' ; done
ARR=("${ARR[@]}")

There are already few elements worth mentioning:

  1. "${!ARR[@]}" evaluates to indexes of the array (as opposed to elements).
  2. The form "${!ARR[@]}" is a must. You must not skip quotes or change @ to *. Or else the expression will break on associative arrays where keys contain spaces (for example).
  3. The part after do can be whatever you want. The idea is only that you must do unset as shown for the elements that you don't want to have in the array.
  4. It is advised or even needed to use -v and quotes with unset or else bad things may happen.
  5. If the part after do is as suggested above, you can use either && or || to filter out the elements that either pass or fail the condition.
  6. The second line, reassignment of ARR, is needed only with non-associative arrays and will break with associative arrays. (I didn't quickly came out with a generic expression that will handle both while I don't need one…). For ordinary arrays it is needed if you want to have consecutive indexes. Because unset on an array element does not modify (drop by one) elements of higher indexes - it just makes a hole in the indexes. Now if you only iterate over the array (or expand it as a whole) this makes no problem. But for other cases you need to reassign indexes. Note also that if you had any hole in the indexes before it will be removed as well. So if you need to preserve existing holes more logic has to be done beside the unset and final reassignment.

Now as it comes to the condition. The [[ ]] expression is an easy way if you can use it. (See here.) In particular it supports regular expression matching using the Extended Regular Expressions. (See here.) Also be careful with using grep or any other line-based tool for this if you expect that array elements can contain not only spaces but also new lines. (While a very nasty file name could have a new line character I think…)


Referring to the question itself the [[ ]] expression would have to be:

[[ ${ARR[$index]} =~ ^pref ]]

(with && unset as above)


Let's finally see how this works with those difficult cases. First we construct the array:

declare -a ARR='([0]="preffoo" [1]="bar" [2]="foo" [3]="prefbaz" [4]="baz" [5]="prefbar" [6]="pref with spaces")'
ARR+=($'pref\nwith\nnew line')
ARR+=($'\npref with new line before')

we can see that we have all the complex cases by running declare -p ARR and getting:

declare -a ARR='([0]="preffoo" [1]="bar" [2]="foo" [3]="prefbaz" [4]="baz" [5]="prefbar" [6]="pref with spaces" [7]="pref
with
new line" [8]="
pref with new line before")'

Now we run the filter expression:

for index in "${!ARR[@]}" ; do [[ ${ARR[$index]} =~ ^pref ]] && unset -v 'ARR[$index]' ; done

and another test (declare -p ARR) gives expected:

declare -a ARR='([1]="bar" [2]="foo" [4]="baz" [8]="
pref with new line before")'

note how all elements starting with pref were removed but indexes did not change. Note also that ${ARRAY[8]} is still there since it starts with new line rather than pref.

Now for the final reassignment:

ARR=("${ARR[@]}")

and check (declare -p ARR):

declare -a ARR='([0]="bar" [1]="foo" [2]="baz" [3]="
pref with new line before")'

which is exactly what was expected.


For the closing notes. It would be nice if this could be changed into a flexible one-liner. But I don't think there is a way to get it shorter and simpler as it is now without defining functions or alike.

As for the function it would be nice as well to have it accept array, return array and have easy to configure test to exclude or keep. But I'm not good enough with Bash to do it now.

Adam Badura
  • 5,069
  • 1
  • 35
  • 70
  • THANK YOU! This is working nicely... and is simple enough – davidhq Sep 15 '18 at 21:54
  • The first code snippet contains `unset -v 'ARR[$index]'`. Wouldn't the single quotes prevent the substitution of `$index`? – Roland Weber Apr 03 '20 at 07:23
  • @RolandWeber, I think this did work on my end back at the time when I wrote this. But as for now, it would have to be just checked. I'm not proficient enough to be able to tell otherwise than by experiment. – Adam Badura Apr 03 '20 at 10:49
  • 1
    @RolandWeber: No. It works with the single quotes. Or without quotes. Try this: `a=(0 1 "and two"); for i in ${!a[*]}; do echo "$i=${a[$i]}"; ((i=1)) && unset -v 'a[$i]'; done; echo; declare -p a; a=("${a[@]}"); declare -p a` – mivk Aug 01 '21 at 17:48
  • 1
    Kudos to you for an excellent well written answer, this is pure gold. – starfry Aug 04 '23 at 15:55
13

Another way to strip a flat string is to convert it to an array then use the array method:

x="preffoo bar foo prefbaz baz prefbar"
x=($x)
x=${x[@]//pref*}

Contrast this with starting and ending with an array:

x=(preffoo bar foo prefbaz baz prefbar)
x=(${x[@]//pref*})
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
  • I really like this stuff as it really reduces the previous amount of code I use to do for this kind of action. – pn1 dude Jul 23 '12 at 22:03
  • 2
    It doesn't really work that well with arrays. Getting an array out of that is hard if initial elements contained spaces for example. Have for example `declare -a ARR=('element1' 'with space' 'with two spaces' 'element4')` and then do `VAR=(${ARR[@]//element*/})`. What you will get in `VAR` is not an array of two elements (`with space` and `with two spaces`) but an array of five elements (`with`, `space`, `with`, `two`, `spaces`). – Adam Badura Nov 02 '16 at 05:51
10

To strip a flat string (Hulk has already given the answer for arrays), you can turn on the extglob shell option and run the following expansion

$ shopt -s extglob
$ unset x
$ x="preffoo bar foo prefbaz baz prefbar"
$ echo ${x//pref*([^ ])?( )}
bar foo baz

The extglob option is needed for the *(pattern-list) and ?(pattern-list) forms. This allows you to use regular expressions (although in a different form to most regular expressions) instead of just pathname expansion (*?[).

The answer that Hulk has given for arrays will work only on arrays. If it appears to work on flat strings, its only because in testing the array was not unset first.

e.g.

$ x=(preffoo bar foo prefbaz baz prefbar)
$ echo ${x[@]//pref*/}
bar foo baz
$ x="preffoo bar foo prefbaz baz prefbar"
$ echo ${x[@]//pref*/}
bar foo baz
$ unset x
$ x="preffoo bar foo prefbaz baz prefbar"
$ echo ${x[@]//pref*/}

$
camh
  • 40,988
  • 13
  • 62
  • 70
  • 1
    +1 thanks for clearing up the confusion from Hulk's post and pointing out this other path. – kynan Aug 28 '10 at 06:07
7

You can do this:

Delete all occurrences of substring.

# Not specifing a replacement defaults to 'delete' ...
echo ${x[@]//pref*/}      # one two three four ve ve
#               ^^          # Applied to all elements of the array.

Edit:

For white spaces it's kind of same

x="preffoo bar foo prefbaz baz prefbar"
echo ${x[@]//pref*/}

Output:

bar foo baz

Alex Weitz
  • 3,199
  • 4
  • 34
  • 57
Hulk
  • 32,860
  • 62
  • 144
  • 215
  • Anything similar for a string of words separated by whitespace? – kynan Aug 26 '10 at 19:16
  • Seems that doesn't quite work, that removes everything after the first occurrence of `pref` – kynan Aug 26 '10 at 19:28
  • Are you using bash? I tried there very same and got an empty output – kynan Aug 26 '10 at 19:35
  • why is that there is a specific requirement for bash?Cant u use /bin/sh – Hulk Aug 26 '10 at 20:00
  • No, it's not. But I think camh pointed out why it worked for your case. – kynan Aug 28 '10 at 06:05
  • 3
    It doesn't really work that well with arrays. The `echo` of the result looks fine, alright - but the point is (or could be) to have as the result an array. While getting an array out of that is hard if initial elements contained spaces for example. Have for example `declare -a ARR=('element1' 'with space' 'with two spaces' 'element4')` and then do `VAR=(${ARR[@]//element*/})`. What you will get in `VAR` is not an array of two elements (`with space` and `with two spaces`) but an array of five elements (`with`, `space`, `with`, `two`, `spaces`). – Adam Badura Nov 02 '16 at 05:51
  • This has multiple problems and should not be used. – tripleee Sep 04 '19 at 06:44
2

I defined and used following function:

# Removes elements from an array based on a given regex pattern.
# Usage: filter_arr pattern array
# Usage: filter_arr pattern element1 element2 ...
filter_arr() {  
    arr=($@)
    arr=(${arr[@]:1})
    dirs=($(for i in ${arr[@]}
        do echo $i
    done | grep -v $1))
    echo ${dirs[@]}
}

Example usage:

$ arr=(chicken egg hen omelette)
$ filter_arr "n$" ${arr[@]}

Output:

egg omelette

The output from function is a string. To convert it back to an array:

$ arr2=(`filter_arr "n$" ${arr[@]}`)
Kshitiz Sharma
  • 17,947
  • 26
  • 98
  • 169
  • If array elements contain spaces this will not conserve them but instead split making new element arrays. You can see it by having `declare -a arr=('element1' 'with space' 'with two spaces' 'element4')` and filtering for `element`. The result instead of containing just `with space` and `with two spaces` will contain each word as separate element. – Adam Badura Nov 02 '16 at 06:18
2

Here's a way using grep:

(IFS=$'\n' && echo "${MY_ARR[*]}") | grep '[^.]*.pattern/[^.]*.txt'

The meat here is that IFS=$'\n' causes "${MY_ARR[*]}" to expand with newlines separating the items, so it can be piped through grep.

In particular, this will handle spaces embedded inside the items of the array.

Marcin
  • 48,559
  • 18
  • 128
  • 201