132

I've got almost the same question as here.

I have an array which contains aa ab aa ac aa ad, etc. Now I want to select all unique elements from this array. Thought, this would be simple with sort | uniq or with sort -u as they mentioned in that other question, but nothing changed in the array... The code is:

echo `echo "${ids[@]}" | sort | uniq`

What am I doing wrong?

Community
  • 1
  • 1
Jetse
  • 2,223
  • 5
  • 17
  • 12

16 Answers16

180

A bit hacky, but this should do it:

echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '

To save the sorted unique results back into an array, do Array assignment:

sorted_unique_ids=($(echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))

If your shell supports herestrings (bash should), you can spare an echo process by altering it to:

tr ' ' '\n' <<< "${ids[@]}" | sort -u | tr '\n' ' '

A note as of Aug 28 2021:

According to ShellCheck wiki 2207 a read -a pipe should be used to avoid splitting. Thus, in bash the command would be:

IFS=" " read -r -a ids <<< "$(echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' ')"

or

IFS=" " read -r -a ids <<< "$(tr ' ' '\n' <<< "${ids[@]}" | sort -u | tr '\n' ' ')"

Input:

ids=(aa ab aa ac aa ad)

Output:

aa ab ac ad

Explanation:

  • "${ids[@]}" - Syntax for working with shell arrays, whether used as part of echo or a herestring. The @ part means "all elements in the array"
  • tr ' ' '\n' - Convert all spaces to newlines. Because your array is seen by shell as elements on a single line, separated by spaces; and because sort expects input to be on separate lines.
  • sort -u - sort and retain only unique elements
  • tr '\n' ' ' - convert the newlines we added in earlier back to spaces.
  • $(...) - Command Substitution
  • Aside: tr ' ' '\n' <<< "${ids[@]}" is a more efficient way of doing: echo "${ids[@]}" | tr ' ' '\n'
Community
  • 1
  • 1
sampson-chen
  • 45,805
  • 12
  • 84
  • 81
  • 46
    +1. A bit tidier: store uniq elements in a new array: `uniq=($(printf "%s\n" "${ids[@]}" | sort -u)); echo "${uniq[@]}"` – glenn jackman Nov 30 '12 at 16:11
  • @glennjackman oh neat! I didn't even realize you can use `printf` that way (give more arguments than format strings) – sampson-chen Nov 30 '12 at 16:17
  • Yes, the format string gets reused until all the arguments are consumed -- it's very handy – glenn jackman Nov 30 '12 at 16:26
  • +1 to sampson-chen for the explanation in your post. – g000ze Mar 15 '14 at 08:45
  • 5
    +1 I'm not sure if this is an isolated case, but putting unique items back into an array needed additional parentheses such as: `sorted_unique_ids=($(echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))`. Without the additional parentheses it was giving it as a string. – whla Nov 18 '14 at 20:31
  • @sampson-chen: you could edit your answer including suggestions from glenn jackman and Michael! – caesarsol Jun 10 '15 at 09:11
  • 1
    This solution doesn't work with empty values (""), that get skipped at best, or that may add unwanted spaces. Build a script based on [array iteration](http://stackoverflow.com/a/8880633/2227298) if you have empty values in your input array. – KrisWebDev Oct 16 '15 at 20:56
  • 3
    If you don't want to alter the order of the elements, use `... | uniq | ...` instead of `... | sort -u | ...`. – Jesse Chisholm Dec 23 '16 at 16:31
  • 2
    @Jesse, `uniq` only removes _consecutive_ duplicates. In the example in this answer, `sorted_unique_ids` will end up identical to the original `ids`. To preserve order, try `... | awk '!seen[$0]++'`. See also https://stackoverflow.com/questions/1444406/how-can-i-delete-duplicate-lines-in-a-file-in-unix. – Rob Kennedy May 22 '19 at 16:14
  • If you're assigning back to a variable, you can leave out the `| tr '\n' ' '` part, as the subshell already strips the output of any newline characters by default ( if there are no quotes around the subshell command `$(...)` ). – Douwe van der Leest Jul 29 '19 at 11:13
  • 3
    -1: This breaks array elements containing a space into multiple values, which (to me) is one of the main benefits of using arrays over simple space-delimited strings. – bukzor Nov 23 '19 at 17:51
42

If you're running Bash version 4 or above (which should be the case in any modern version of Linux), you can get unique array values in bash by creating a new associative array that contains each of the values of the original array. Something like this:

$ a=(aa ac aa ad "ac ad")
$ declare -A b
$ for i in "${a[@]}"; do b["$i"]=1; done
$ printf '%s\n' "${!b[@]}"
ac ad
ac
aa
ad

This works because in any array (associative or traditional, in any language), each key can only appear once. When the for loop arrives at the second value of aa in a[2], it overwrites b[aa] which was set originally for a[0].

Doing things in native bash can be faster than using pipes and external tools like sort and uniq, though for larger datasets you'll likely see better performance if you use a more powerful language like awk, python, etc.

If you're feeling confident, you can avoid the for loop by using printf's ability to recycle its format for multiple arguments, though this seems to require eval. (Stop reading now if you're fine with that.)

$ eval b=( $(printf ' ["%s"]=1' "${a[@]}") )
$ declare -p b
declare -A b=(["ac ad"]="1" [ac]="1" [aa]="1" [ad]="1" )

The reason this solution requires eval is that array values are determined before word splitting. That means that the output of the command substitution is considered a single word rather than a set of key=value pairs.

While this uses a subshell, it uses only bash builtins to process the array values. Be sure to evaluate your use of eval with a critical eye. If you're not 100% confident that chepner or glenn jackman or greycat would find no fault with your code, use the for loop instead.

ghoti
  • 45,319
  • 8
  • 65
  • 104
  • produces error: expression recursion level exceeded – Benubird Feb 10 '14 at 11:36
  • 2
    @Benubird - can you perhaps pastebin your terminal contents? It works perfectly for me, so my best guess is that you've got (1) a typo, (2) an older version of bash (associative arrays were added to v4), or (3) a ridiculously large influx of cosmic background radiation caused by the quantum black hole in your neighbour's basement, generating interference with the signals within your computer. – ghoti Feb 11 '14 at 14:20
  • 1
    can't, didn't keep the one that didn't work. but, I tried running yours just now and it worked, so probably the cosmic radiation thing. – Benubird Feb 11 '14 at 14:45
  • guessing that this answer utilizes bash v4 (associative arrays) and if someone tries in bash v3 it wont work (probably not what @Benubird saw). Bash v3 is still default in many envs – nhed Apr 03 '15 at 18:32
  • @nhed ... It does indeed require bash v4. I admit, my knowledge of bash version distribution is limited... What modern systems still ship with bash v3? – ghoti Apr 03 '15 at 19:53
  • @ghoti my mac book is from late last year and the default shell is ```$ bash --version GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14) Copyright (C) 2007 Free Software Foundation, Inc. ``` still I like this answer best but has a portability issue – nhed Apr 03 '15 at 21:04
  • 1
    @nhed, point taken. I see that my up-to-date Yosemite Macbook has the same version in base, though I've installed v4 from macports. This question is tagged "linux", but I've updated my answer to point out the requirement. – ghoti Apr 04 '15 at 00:50
29

I realize this was already answered, but it showed up pretty high in search results, and it might help someone.

printf "%s\n" "${IDS[@]}" | sort -u

Example:

~> IDS=( "aa" "ab" "aa" "ac" "aa" "ad" )
~> echo  "${IDS[@]}"
aa ab aa ac aa ad
~>
~> printf "%s\n" "${IDS[@]}" | sort -u
aa
ab
ac
ad
~> UNIQ_IDS=($(printf "%s\n" "${IDS[@]}" | sort -u))
~> echo "${UNIQ_IDS[@]}"
aa ab ac ad
~>
das.cyklone
  • 421
  • 4
  • 5
  • 1
    to fix the array I was forced to do this: `ids=(ab "a a" ac aa ad ac aa);IFS=$'\n' ids2=(\`printf "%s\n" "${ids[@]}" |sort -u\`)`, so I added `IFS=$'\n'` suggested by @gniourf_gniourf – Aquarius Power Jul 23 '14 at 05:16
  • I also had to backup and, after the command, restore IFS value! or it messes other things.. – Aquarius Power Jul 23 '14 at 05:56
  • @Jetse This should be the accepted answer as it uses only two commands, no loops, no eval and is the most compact version. – mgutt Aug 12 '19 at 14:09
  • 1
    @AquariusPower Careful, you are basically doing: `IFS=$'\n'; ids2=(...)`, since temporary assignment before variable assignments is not possible. Instead use this construction: `IFS=$'\n' read -r -a ids2 <<<"$(printf "%s\n" "${ids[@]}" | sort -u)"`. – Yeti Jun 17 '20 at 03:47
18

If your array elements have white space or any other shell special character (and can you be sure they don't?) then to capture those first of all (and you should just always do this) express your array in double quotes! e.g. "${a[@]}". Bash will literally interpret this as "each array element in a separate argument". Within bash this simply always works, always.

Then, to get a sorted (and unique) array, we have to convert it to a format sort understands and be able to convert it back into bash array elements. This is the best I've come up with:

eval a=($(printf "%q\n" "${a[@]}" | sort -u))

Unfortunately, this fails in the special case of the empty array, turning the empty array into an array of 1 empty element (because printf had 0 arguments but still prints as though it had one empty argument - see explanation). So you have to catch that in an if or something.

Explanation: The %q format for printf "shell escapes" the printed argument, in just such a way as bash can recover in something like eval! Because each element is printed shell escaped on it's own line, the only separator between elements is the newline, and the array assignment takes each line as an element, parsing the escaped values into literal text.

e.g.

> a=("foo bar" baz)
> printf "%q\n" "${a[@]}"
'foo bar'
baz
> printf "%q\n"
''

The eval is necessary to strip the escaping off each value going back into the array.

vontrapp
  • 649
  • 6
  • 6
  • This is the only code that worked for me because my array of strings had spaces. The %q is what did the trick. Thanks :) – Somaiah Kumbera Nov 23 '15 at 15:43
  • And if you don't want to alter the order of the elements, use `uniq` instead of `sort -u`. – Jesse Chisholm Dec 23 '16 at 16:32
  • 2
    Note that `uniq` does not work properly on unsorted lists, so it must always be used in combination with `sort`. – Jean Paul Feb 08 '17 at 13:53
  • 1
    uniq on an unsorted list will remove *consecutive* duplicates. It will not remove identical list elements separated by something else inbetween. uniq may be useful enough depending on the expected data and the desire to maintain original order. – vontrapp May 22 '19 at 07:08
  • Can someone tell me why I need `%q` over `%s`? – Changdae Park Feb 23 '23 at 10:21
14

'sort' can be used to order the output of a for-loop:

for i in ${ids[@]}; do echo $i; done | sort

and eliminate duplicates with "-u":

for i in ${ids[@]}; do echo $i; done | sort -u

Finally you can just overwrite your array with the unique elements:

ids=( `for i in ${ids[@]}; do echo $i; done | sort -u` )
corbyn42
  • 141
  • 1
  • 2
  • And if you don't want to change the order of what's left, you don't have to: `ids=( \`for i in ${ids[@]}; do echo $i; done | uniq\` )` – Jesse Chisholm Dec 23 '16 at 16:34
  • 2
    Note, however, that if you don't change the order, you also won't get the desired result, as `uniq` only removes *adjacent* duplicate lines. – Jason Kohles Dec 23 '20 at 16:33
11

this one will also preserve order:

echo ${ARRAY[@]} | tr [:space:] '\n' | awk '!a[$0]++'

and to modify the original array with the unique values:

ARRAY=($(echo ${ARRAY[@]} | tr [:space:] '\n' | awk '!a[$0]++'))
faustus
  • 313
  • 2
  • 10
  • Don't use `uniq`. It needs sorting, where awk does not, and the intent of this answer is to preserve ordering when the input is unsorted. – bukzor Nov 23 '19 at 18:01
  • Btw this example was made famous by this blog post: https://catonmat.net/awk-one-liners-explained-part-two. What a fascinating awk one-liner – smac89 Mar 24 '21 at 23:57
9

To create a new array consisting of unique values, ensure your array is not empty then do one of the following:

Remove duplicate entries (with sorting)

readarray -t NewArray < <(printf '%s\n' "${OriginalArray[@]}" | sort -u)

Remove duplicate entries (without sorting)

readarray -t NewArray < <(printf '%s\n' "${OriginalArray[@]}" | awk '!x[$0]++')

Warning: Do not try to do something like NewArray=( $(printf '%s\n' "${OriginalArray[@]}" | sort -u) ). It will break on spaces.

Six
  • 5,122
  • 3
  • 29
  • 38
6

How about this variation?

printf '%s\n' "${ids[@]}" | sort -u
jmg
  • 617
  • 9
  • 13
5

Without loosing the original ordering:

uniques=($(tr ' ' '\n' <<<"${original[@]}" | awk '!u[$0]++' | tr '\n' ' '))
estani
  • 24,254
  • 2
  • 93
  • 76
5

cat number.txt

1 2 3 4 4 3 2 5 6

print line into column: cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}'

1
2
3
4
4
3
2
5
6

find the duplicate records: cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}' |awk 'x[$0]++'

4
3
2

Replace duplicate records: cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}' |awk '!x[$0]++'

1
2
3
4
5
6

Find only Uniq records: cat number.txt | awk '{for(i=1;i<=NF;i++) print $i|"sort|uniq -u"}

1
5
6
VIPIN KUMAR
  • 3,019
  • 1
  • 23
  • 34
5

If you want a solution that only uses bash internals, you can set the values as keys in an associative array, and then extract the keys:

declare -A uniqs
list=(foo bar bar "bar none")
for f in "${list[@]}"; do 
  uniqs["${f}"]=""
done

for thing in "${!uniqs[@]}"; do
  echo "${thing}"
done

This will output

bar
foo
bar none
rln
  • 121
  • 1
  • 2
  • I just noticed this is essentially the same as @ghotis answer above, except his solution doesn't take list items with spaces into account. – rln Jan 11 '17 at 14:46
  • Good point. I've added quotes to my solution so it now handles spaces. I originally wrote it merely to handle the sample data in the question, but it's always good to cover contingencies like this. Thanks for the suggestion. – ghoti Feb 02 '17 at 04:21
  • Note that order isn't maintained in an associative array: https://stackoverflow.com/a/29161460/89484 – Paul Irish Mar 11 '21 at 18:44
3

Another option for dealing with embedded whitespace, is to null-delimit with printf, make distinct with sort, then use a loop to pack it back into an array:

input=(a b c "$(printf "d\ne")" b c "$(printf "d\ne")")
output=()

while read -rd $'' element
do 
  output+=("$element")
done < <(printf "%s\0" "${input[@]}" | sort -uz)

At the end of this, input and output contain the desired values (provided order isn't important):

$ printf "%q\n" "${input[@]}"
a
b
c
$'d\ne'
b
c
$'d\ne'

$ printf "%q\n" "${output[@]}"
a
b
c
$'d\ne'
Morgen
  • 1,010
  • 1
  • 11
  • 15
2

All the following work in bash and sh and are without error in shellcheck but you need to suppress SC2207

arrOrig=("192.168.3.4" "192.168.3.4" "192.168.3.3")

# NO SORTING
# shellcheck disable=SC2207
arr1=($(tr ' ' '\n' <<<"${arrOrig[@]}" | awk '!u[$0]++' | tr '\n' ' ')) # @estani
len1=${#arr1[@]}
echo "${len1}"
echo "${arr1[*]}"

# SORTING
# shellcheck disable=SC2207
arr2=($(printf '%s\n' "${arrOrig[@]}" | sort -u)) # @das.cyklone
len2=${#arr2[@]}
echo "${len2}"
echo "${arr2[*]}"

# SORTING
# shellcheck disable=SC2207
arr3=($(echo "${arrOrig[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' ')) # @sampson-chen
len3=${#arr3[@]}
echo "${len3}"
echo "${arr3[*]}"

# SORTING
# shellcheck disable=SC2207
arr4=($(for i in "${arrOrig[@]}"; do echo "${i}"; done | sort -u)) # @corbyn42
len4=${#arr4[@]}
echo "${len4}"
echo "${arr4[*]}"

# NO SORTING
# shellcheck disable=SC2207
arr5=($(echo "${arrOrig[@]}" | tr "[:space:]" '\n' | awk '!a[$0]++')) # @faustus
len5=${#arr5[@]}
echo "${len5}"
echo "${arr5[*]}"

# OUTPUTS

# arr1
2 # length
192.168.3.4 192.168.3.3 # items

# arr2
2 # length
192.168.3.3 192.168.3.4 # items

# arr3
2 # length
192.168.3.3 192.168.3.4 # items

# arr4
2 # length
192.168.3.3 192.168.3.4 # items

# arr5
2 # length
192.168.3.4 192.168.3.3 # items

Output for all of these is 2 and correct. This answer basically summarises and tidies up the other answers in this post and is a useful quick reference. Attribution to original answer is given.

danday74
  • 52,471
  • 49
  • 232
  • 283
2

In zsh you can use (u) flag:

$ ids=(aa ab aa ac aa ad)
$ print ${(u)ids}
aa ab ac ad
TupacAmaru
  • 58
  • 5
0

Try this to get uniq values for first column in file

awk -F, '{a[$1];}END{for (i in a)print i;}'
-2
# Read a file into variable
lines=$(cat /path/to/my/file)

# Go through each line the file put in the variable, and assign it a variable called $line
for line in $lines; do
  # Print the line
  echo $line
# End the loop, then sort it (add -u to have unique lines)
done | sort -u
K Law
  • 11