3

I'm trying to run a script for all files in a directory with a common ID.

ls -1 *.vcf

a1.sourceA.vcf
a1.sourceB.vcf
a1.sourceC.vcf
a2.sourceA.vcf
a2.sourceB.vcf
a2.sourceC.vcf
a3.sourceA.vcf
a3.sourceC.vcf

The ID in each case precedes the first . (a1, a2 or a3) and for each ID I want to have all the sources for that ID in an associative array, keyed by the ID, e.g.;

a1 => [a1.sourceA.vcf, a1.sourceB.vcf, a1.sourceC.vcf]

I've attempted this as follows:

for file in $(ls *.vcf | sort)
do
  id=$(echo $file | cut -d '.' -f 1)
  vcfs[$id]+=$file

done

for i in "${!vcfs[@]}"
do
  echo "key  : $i"
  echo "value: ${vcfs[$i]}"
  echo " "
done

But I can't figure out how to get it working.

In Perl I would push values onto a hash of arrays in the loop:

push @{$vcfs{$id}}, $file;

to give me a data structure like this:

  'a1' => [
            'a1.sourceA.vcf',
            'a1.sourceB.vcf',
            'a1.sourceC.vcf'
          ],
  'a3' => [
            'a3.sourceA.vcf',
            'a3.sourceC.vcf'
          ],
  'a2' => [
            'a2.sourceA.vcf',
            'a2.sourceB.vcf',
            'a2.sourceC.vcf'
          ]

How can I achieve this in bash?

fugu
  • 6,417
  • 5
  • 40
  • 75
  • it stops with `a3` or you have `a4` and so? – sjsam May 10 '17 at 12:20
  • 1
    @sjsam - any number of files – fugu May 10 '17 at 12:22
  • See http://stackoverflow.com/a/28051297/1100158. That was the answer for a list of lists, but your solution will be similar. – ccarton May 10 '17 at 12:27
  • @ccarton has an answer for you. You can push to an associative array, but I'm not going to answer that, b/c you don't want to use an associative array (which has one value per key), but a hash array. Bash doesn't support that, so you need the extensive work around that ccarton provided. – SaintHax May 10 '17 at 13:10

1 Answers1

2

From another answer given in question's comments

unset a1 a2 a3

function push {
    local arr_name=$1
    shift
    if [[ $(declare -p "$arr_name" 2>&1) != "declare -a "* ]]
    then
        declare -g -a "$arr_name"
    fi
    declare -n array=$arr_name
    array+=($@)
}

for file in *.vcf; do [[ -e $file ]] && push "${file%%.*}" "$file"; done

(IFS=,;echo "${a1[*]}")
(IFS=,;echo "${a2[*]}")
(IFS=,;echo "${a3[*]}")

But depending on needs maybe for with pattern is sufficient

for file in a1.*.vcf; do ... ; done

Finally $(ls ) must not be used in for loops as seen in other answers.

Why you shouldn't parse the output of ls

Nahuel Fouilleul
  • 18,726
  • 2
  • 31
  • 36