0

I have a shell script which by now outputs information. The output format is:

cellname1
cellvalue1
cellname2
cellvalue2
...

The tricky part is, that this script iterates over multiple sets of data with different cellnames (a few match always, a few match sometimes...). I also do not know all possible values for cellname. So the script should identify the right cell by looking at the cellname and put the value in this cell. When a cellname appears the first time it should just be added as a new column.

Sample output:

cellname1, cellname2, cellname3 # not necessarily needed since i can add them at the end
value1, value2, value3
foo1, foo2, foo3
bar1, bar2, bar3, bar4 # <-- see the new value here

I am still new to bash and appreciate any help here

Socowi
  • 25,550
  • 3
  • 32
  • 54
Fuzzyma
  • 7,619
  • 6
  • 28
  • 60
  • Please, show the input corresponding to the sample output, it's unclear what's going on. How do you know that a new line in the CSV should start? – choroba Oct 24 '16 at 14:08
  • The script loops over sets of data so it knows when one set is finished and the next starts – Fuzzyma Oct 24 '16 at 14:16

1 Answers1

1

The trick is loop thru the data two lines at a time; storing the values to an array; then outputting a csv at the end (if you want you can print output in the if [ -z "$name" ] block but then you loose the nice headers).

#!/bin/bash
declare -A cell
declare -A head

i=0
while read name
do
    if [ -z "$name" ]
    then
        ((i+=1))
    else
        head[$name]=$name
        read value
        cell[$i,$name]=$value;
    fi
done < "${1:-/dev/stdin}"

printf "%-10s; " "${head[@]}"; echo
printf "%.0s----------; " ${head[@]}; echo 

j=0
until [ $j -gt $i ]; do   
    for name in ${head[@]}
    do
        printf "%-10s; " "${cell[$j,$name]}"
    done
    echo
    ((j+=1))
done

The above script presumes the sets are separated by a single empty line and will return:

$ head data
head1
value1-1
head2
value2-1

head2
value2-2

$ ./csvgen.sh data
head2     ; head3     ; head1     ; head4     ; 
----------; ----------; ----------; ----------; 
value2-1  ;           ; value1-1  ;           ; 
value2-2  ; value3-2  ;           ;           ; 
value2-3  ;           ; value1-3  ; value4-3  ; 

How it works:

loop over each line of either a file or stdin.

while read name
do
# ...
done < "${1:-/dev/stdin}"

if [ -z "$name" ] # If the line has a length of zero the set has ended
then              # so increse the set index by 1.
    ((i+=1))
else
    head[$name]=$name  # this array contains all the headers we have seen
    read value  # read the next line to $value
    cell[$i,$name]=$value; # save $value in array indexed by set and header
fi

printf "%-10s; " "${head[@]}";  # print each header from 
echo   # the above wont end the line so echo for a "\n"

printf "%.0s----------; " ${head[@]}; # %.0s truncates the input to nothing  
echo                                  # printing only the '----------'

until [  $j -gt $i ]; do     # for each set index
    for name in ${head[@]}   # loop thru the headers
    do
        printf "%-10s; " "${cell[$j,$name]}" # then print the values
    done
    echo # end each set with "\n"
    ((j+=1))
done
Community
  • 1
  • 1
Finbar Crago
  • 432
  • 5
  • 12
  • Thats _exactly_ what I am looking for. Even the set seperation matches. It would be nice if you could add some comments to the code so that I can understand it – Fuzzyma Oct 24 '16 at 18:28
  • no problem, answer is updated. bash is not all that great for text processing, moving forward you will probably want to look into perl or awk for these types of jobs... – Finbar Crago Oct 24 '16 at 19:38