0

I am new to associative arrays in bash so please forgive me if I sound silly somewhere. Let's say am reading through a large file and using bash (version = 4.2.46) associative array to store FDR values for genes. For one file, I am simply doing:

declare -A array

while read ID GeneID geneSymbol chr strand exonStart_0base exonEnd upstreamES upstreamEE downstreamES downstreamEE ID IJC_SAMPLE_1 SJC_SAMPLE_1 IJC_SAMPLE_2 SJC_SAMPLE_2 IncFormLen SkipFormLen PValue FDR IncLevel1 IncLevel2 IncLevelDifference; do 
    array[$geneSymbol]="${array[$geneSymbol]}${array[$geneSymbol]:+,}$FDR" ; 
done < input.txt

Which will store the FDR values that I can print by doing

    for key in "${!array[@]}"; do echo "$key->${array[$key]}"; done 

# Prints out
"ABHD14B"->0.285807588279,0.898327660004,0.820468496328
"DHFR"->0.464931314555,0.449582575347
...

I naively tried to read several file through my array by doing

declare -A array

find ./aligned.filtered/rMAT*/MATS_output/SE.MATS.JunctionCountOnly.txt  -type f -exec cat {} + | 

while read ID GeneID geneSymbol chr strand exonStart_0base exonEnd upstreamES upstreamEE downstreamES downstreamEE ID IJC_SAMPLE_1 SJC_SAMPLE_1IJC_SAMPLE_2 SJC_SAMPLE_2 IncFormLen SkipFormLen PValue FDR IncLevel1 IncLevel2 IncLevelDifference; 
do  array[$geneSymbol]="${array[$geneSymbol]}${array[$geneSymbol]:+,}$FDR" ;  
done

But in this case my array ends up being empty. I can of course cat all the files I need and save them into a single file that I can use as above, but it would be nice to know how to make an associative array to store data from several distinct files.

Thank you very much!

1 Answers1

0

You probably shouldn't be doing this in bash in the first place, but your main problem is that the while loop runs in a subshell induced by the pipeline. Use process substitution to invert the relationship.

(Also, don't give names to all the fields you don't actually use; just split the line into an indexed array and pick out the two fields you actually want.)

while read -a fields; do
  geneSymbol=${fields[1]}
  FDR=${fields[...]}   # some number; i'm not counting
  array[$geneSymbol]="${array[$geneSymbol]}${array[$geneSymbol]:+,}$FDR"
done < <(find ./aligned.filtered/rMAT*/MATS_output/SE.MATS.JunctionCountOnly.txt  -type f -exec cat {} +)

find probably isn't necessary; just put your while loop inside a for loop:

for f in ./aligned.filtered/rMAT*/MATS_output/SE.MATS.JunctionCountOnly.txt; do
  while read -a fields; do
    ...
  done < "$f"
done
chepner
  • 497,756
  • 71
  • 530
  • 681