0

I am trying to write a function in bash but it won't work. The function is as follows, it gets a file in the format of:

1 2 first 3
4 5 second 6
...

I'm trying to access only the strings in the 3rd word in every line and to fill the array "arr" with them, without repeating identical strings. When I activated the "echo" command right after the for loop, it printed only the first string in every iteration (in the above case "first").

Thank you!

function storeDevNames {

n=0
b=0
while read line; do
    line=$line
    tempArr=( $line )
    name=${tempArr[2]}
    for i in $arr ; do
        #echo ${arr[i]}
        if [ "${arr[i]}" == "$name" ]; then
            b=1
            break
        fi
    done
    if [ "$b" -eq 0 ]; then
        arr[n]=$name
        n=$(($n+1))
    fi
    b=0
done < $1
}
Cyrus
  • 84,225
  • 14
  • 89
  • 153

3 Answers3

1

You can replace all of your read block with:

arr=( $(awk '{print $3}' <"$1" | sort | uniq) )

This will fill arr with only unique names from the 3rd word such as first, second, ... This will reduce the entire function to:

function storeDevNames {
    arr=( $(awk '{print $3}' <"$1" | sort | uniq) )
}

Note: this will provide a list of all unique device names in sorted order. Removing duplicates also destroys the original order. If preserving the order accept where duplicates are removed, see 4ae1e1's alternative.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Your answer breaks the order of lines, which might (or might not) be important. See [the other `awk` answer](http://stackoverflow.com/a/29455557/1944784) (disclosure: mine) for how to preserve the order. – 4ae1e1 Apr 05 '15 at 08:42
  • Indeed it does, if retaining the order of device names is important, then your answer is the one to use. – David C. Rankin Apr 05 '15 at 08:44
1

The following line seems suspicious

    for i in $arr ; do

I changed it as follows and it works for me:

#! /bin/bash

function storeDevNames {
    n=0
    b=0
    while read line; do
        # line=$line # ?!
        tempArr=( $line )
        name=${tempArr[2]}
        for i in "${arr[@]}" ; do
            if [ "$i" == "$name" ]; then
                b=1
                break
            fi
        done
        if [ "$b" -eq 0 ]; then
            arr[n]=$name
            (( n++ ))
        fi
        b=0
    done
}

storeDevNames < <(cat <<EOF 
1 2 first 3
4 5 second 6
7 8 first 9
10 11 third 12
13 14 second 15
EOF
)

echo "${arr[@]}"
choroba
  • 231,213
  • 25
  • 204
  • 289
  • You're right, it does print the whole array. I still don't get two things: 1. Why does it store two identical strings? Is there something wrong with my if-else? 2. Why doesn't it print every single element in the array with this echo command but prints only the first one every time? – Gal Fleissig Apr 05 '15 at 08:55
  • @GalFl: I don't understand. I'm getting no duplicates. Show the code that produces them in the question. – choroba Apr 05 '15 at 08:58
  • I tried it now with a simple txt file and it worked with no duplicates, but I tried it with a .comp file (the format i have to work with) and it does show duplicates. Maybe it's something with this format? maybe there is something different with the spaces or end of lines? – Gal Fleissig Apr 05 '15 at 09:10
  • It works! So just to fully understand: the "i" in the for loop represents a number (like in C for example) or in this case a string? – Gal Fleissig Apr 05 '15 at 09:21
  • It's the string. If you wanted numbers, you'd need something like `for i in $(seq 0 ${#arr[@]})` or `for ((i=0; i<${#arr[@]}; i++))` – choroba Apr 05 '15 at 09:24
1

You're using the wrong tool. awk is designed for this kind of job.

awk '{ if (!seen[$3]++) print $3 }' <"$1"

This one-liner prints the third column of each line, removing duplicates along the way while preserving the order of lines (only the first occurrence of each unique string is printed). sort | uniq, on the other hand, breaks the original order of lines. This one-liner is also faster than using sort | uniq (for large files, which doesn't seem to be applicable in OP's case), since this one-liner linearly scans the file once, while sort is obviously much more expensive.

As an example, for an input file with contents

1 2 first 3
4 5 second 6
7 8 third 9
10 11 second 12
13 14 fourth 15

the above awk one-liner gives you

first
second
third
fourth

To put the results in an array:

arr=( $(awk '{ if (!seen[$3]++) print $3 }' <"$1") )

Then echo ${arr[@]} will give you first second third fourth.

4ae1e1
  • 7,228
  • 8
  • 44
  • 77
  • This looks like a really good solution, but since I'm a beginner in bash I'm trying to write the simplest code to understand rather than the most efficient to write. Plus we are probably not allowed to use "awk". Thank you very much! – Gal Fleissig Apr 05 '15 at 09:13
  • @GalFl No problem, this might still help future users. – 4ae1e1 Apr 05 '15 at 09:13