2

Im stuck on some homework. The requirements of the assignment are to accept an input file and perform some statistics on the values. The user may specify whether to calculate the statistics by row or by value. The shell script must be pure bash script so I can't use awk, sed, perl, python etc.

sample input:
    1   1   1   1   1   1   1
    39  43  4   3225    5   2   2
    6   57  8   9   7   3   4
    3   36  8   9   14  4   3
    3   4   2   1   4   5   5
    6   4   4814    7   7   6   6

I can't figure out how to sort and process the data by column. My code for processing the rows works fine.

# CODE FOR ROWS
while read -r line

        echo $(printf "%d\n" $line | sort -n) | tr ' ' \\t > sorted.txt
        ....
        #I perform the stats calculations
        # for row line by working with the temp file sorted.txt
done

How could I process this data by column? I've never worked with shell script so I've been staring at this for hours.

benpaul
  • 55
  • 1
  • 9
  • check this out: https://meta.stackoverflow.com/questions/334822/how-do-i-ask-and-answer-homework-questions. – PseudoAj Jul 07 '17 at 22:48
  • 1
    Since you're already familiar with `sort`, have a look at `-k` option. – randomir Jul 07 '17 at 22:54
  • I read the response to the question at the link..So should i not ask this question? Or do you suggest I ask it in a different way or provide more information? – benpaul Jul 07 '17 at 22:59
  • 1
    Asking is fine. You need to narrow your question. A *"how do I do it?"* isn't really a proper homework question. A *"here is what I have tried and here is where I'm stuck?"* are. Now there are shades of gray in between. In your case after you read `line` it would be nice to know how many values you have in `line`. Since they are simply tab-separated integers, why not put them in an indexed array (e.g. `array=( $(echo $line) )` and then you know `ncol=${#array[@]}` and you can then validate the requested column and build a column array based on the requested index. – David C. Rankin Jul 08 '17 at 00:08

1 Answers1

0

If you wanted to analyze by columns you'll need the cols value first (number of columns). head -n 1 gives you the first row, and NF counts the number of fields, giving us the number of columns.

cols=$(head -n 1 test.txt | awk '{print NF}');

Then you can use cut with the '\t' delimiter to grab every column from input.txt, and run it through sort -n, as you did in your original post.

$ for i in `seq 2 $((cols+1))`; do cut -f$i -d$'\t' input.txt; done | sort -n > output.txt

For rows, you can use the shell built-in printf with the format modifier %dfor integers. The sort command works on lines of input, so we replace spaces ' ' with newlines \n using the tr command:

$ cat input.txt | while read line; do echo $(printf "%d\n" $line); done | tr ' ' '\n' | sort -n > output.txt

Now take the output file to gather our statistics:

Min: cat output.txt | head -n 1

Max: cat output.txt | tail -n 1

Sum: (courtesy of Dimitre Radoulov): cat output.txt | paste -sd+ - | bc

Mean: (courtesy of porges): cat output.txt | awk '{ $total += $2 } END { print $total/NR }'

Median: (courtesy of maxschlepzig): cat output.txt | awk ' { a[i++]=$1; } END { print a[int(i/2)]; }'

Histogram: cat output.txt | uniq -c

      8 1
      3 2
      4 3
      6 4
      3 5
      4 6
      3 7
      2 8
      2 9
      1 14
      1 36
      1 39
      1 43
      1 57
      1 3225
      1 4814
Andy J
  • 1,479
  • 6
  • 23
  • 40