Average value in CSV files bash script

Question

I'm currently writing a bash script to find out a server's average memory usage per hour, that outputs to a .csv file. What will happen is, the script will run at every 10th min and after running six times in an hour, I'll have 6 different values for the hour in my .csv file and so.

What I'm trying to do is to use the script find out the average value for each hour.

#date(YYYYMMDDHHmm) total     used
201811270000        10        3
201811270010        10        4
201811270020        10        5
201811270030        10        9
201811270040        10        8
201811270050        10        2
201811270100        10        5
201811270110        10        1
201811270120        10        7
201811270130        10        6
201811270140        10        5
201811270150        10        2
201811270200        10        1

Based on the output above, does anyone know a way I can find the average of each hour? For example:

The average of hour 201811270000: 5.166666666666667
The average of hour 201811270100: 4.333333333333333

How do I go about this?

Is it possible to do so?

Welcome to Stack Overflow! Sorry, this is not the way StackOverflow works. Questions of the form "I want to do X, please give me tips and/or sample code" are considered off-topic. Please visit the [help] and read [ask], and especially read [Why is “Can someone help me?” not an actual question?](http://meta.stackoverflow.com/q/284236) — kvantour, Nov 28 '18 at 10:28
Furthremore, you mention a CSV file which is a _Comma Separated File_. It seems your input is not a CSV file. — kvantour, Nov 28 '18 at 10:29

score 2 · Answer 1 · answered Nov 28 '18 at 10:41

Awkward,

awk '
  function calc() {
    if (count) print "The average of hour " date ": " (sum/count);
    count=0; sum=0; date=$1;
  }
  /^#/ {next}             # throw away comment lines
  $1~/00$/ {calc()}       # full hour, time to calculate/reset variables
  END {calc()}            # end of file, ditto
  {count+=1; sum+=$3;}    # update variables at each line
' < file.txt

Pure bash would be herculean, as you'd need to implement floating point arithmetic library first. :)

I've always hated `awk`, but you guys are making me appreciate it. :) — Paul Hodges, Nov 28 '18 at 15:59

Michael Kargl · Answer 2 · 2018-11-28T11:53:00.167

I'd use "tr" to trim the line into smaller, space separated chunks, "cut" out the parts we need to calculate the average. In case the format gets more complex you can always enhance the getFieldAtPosition function.

I don't have a fullfledged bash here atm so I used an array to iterate over instead of reading from file input. For a way to read a file line by line you can check out this answer:

https://stackoverflow.com/a/10929511/1177024

Short bash-only version:

    function average {
       local sum=$1
       local count=$2
       local floatingPointUnits=2

       # https://linux.die.net/man/1/dc
       echo "${floatingPointUnits}k" "$sum" "$count" /p | dc
    }

   function getFieldAtPosition {
        local line=$1
        local position=$2

        echo "$line"  | tr -s ' ' | cut -d ' ' -f $position
    }

    function parseHourFromDate {
        local date=$1
        local positionOfHour=4+2+2
        local lengthOfHour=2

        echo ${date:positionOfHour:lengthOfHour}
    }

    lines=('201811270000        10        3      ' \
        '201810270020        7        2      ' \
        '201811270100        10        3      ' \
        '201810270140        22        2      ' \
        '201811271000        33        3      ' )

    sum=0
    count=0
    declare -A HOURS
    for line in "${lines[@]}"; do
        date=`getFieldAtPosition "$line" 1`
        number=`getFieldAtPosition "$line" 2`
        hour=`parseHourFromDate "$date"`

        # new hour, reset
        if [ "$hour" != "$previousHour" ]; then
           sum=0
           count=0
        fi

        sum=$((sum+number))
        count=$((count+1))

        # save average in associative array
        HOURS[$hour]=`average $sum $count`
        previousHour=$hour
    done


    # print results
    for key in "${!HOURS[@]}"; do
        echo "Average of $key: ${HOURS[$key]}"
    done

`dc`, `cut`, `tr` are not bash. When I said "bash-only", I meant bash builtins and nothing else. If you can use external programs, why not use _nicer_ external programs? :) — Amadan, Nov 29 '18 at 04:10

score 0 · Answer 3 · answered Nov 28 '18 at 12:57

Using Perl

> cat ivan.txt
201811270000        10        3
201811270010        10        4
201811270020        10        5
201811270030        10        9
201811270040        10        8
201811270050        10        2
201811270100        10        5
201811270110        10        1
201811270120        10        7
201811270130        10        6
201811270140        10        5
201811270150        10        2
201811270200        10        1
> perl -F'/\s+/'  -lane ' { $F[0]=~s/..$//g;push @{$datekv{$F[0]}},$F[2];} END { for my $x (sort keys %datekv){ $total=0;$z=0; foreach(@{$datekv{$x}}) {$total+=$_;$z++ } print $x,"\t",$total/$z }}' ivan.txt
2018112700      5.16666666666667
2018112701      4.33333333333333
2018112702      1
>

score 0 · Answer 4 · answered Nov 28 '18 at 15:59

Using bash and bc to calculate:

PROCESS_FILE="file.txt"
PROCESSED_DATE=""

while read -r line; do
        if [[ $line =~ ^# ]]; then
                 continue;
        fi

        LINE_DATE=${line:0:10}
        if [[ $PROCESSED_DATE != *"$LINE_DATE"* ]]; then
                PROCESSED_DATE+=","+$LINE_DATE
                USED_LIST=$(grep $LINE_DATE $PROCESS_FILE | sed 's/  */,/g' | cut -d ',' -f3 | tr '\n' ' ')
                COUNT=0;
                SUM=0;
                for USED in $USED_LIST; do
                        COUNT=$(echo "$COUNT + 1" | bc -l);
                        SUM=$(echo "$SUM + $USED" | bc -l);
                done

                if [ $COUNT -ne 0 ]; then
                        AVG=$(echo "$SUM/$COUNT" | bc -l)
                fi
                echo "The average of hour $LINE_DATE: $AVG"
        fi

done < $PROCESS_FILE

Ivo Yordanov · Answer 5 · 2018-11-28T19:30:35.740

-1

Here is a short (a bit brute) way to do it in bash:

calc() {
awk "BEGIN { print "$*" }";
}

IFS=$'\r\n' GLOBIGNORE='*' command eval  'memory=($(<'$1'))'
for (( i = 0; i < ${#memory[@]}; i++ )); do
echo "${memory[i]}" | awk '{print $1" "$3}' >> values.txt
total=$(awk '{ (Values += $2) } END { printf "%0.0f", Values }' values.txt)
length=$(awk '{print $2}' values.txt | wc -l)
echo "The average of hour $(awk '{print $1}' values.txt | tail -n1): $(calc ${total}/${length})"
done
rm values.txt

The result of the execution is the following:

ivo@spain-nuc-03:~/Downloads/TestStackoverflow$ ./processing.sh test.csv 
The average of hour 201811270000: 3
The average of hour 201811270010: 3.5
The average of hour 201811270020: 4
The average of hour 201811270030: 5.25
The average of hour 201811270040: 5.8
The average of hour 201811270050: 5.16667
The average of hour 201811270100: 5.14286
The average of hour 201811270110: 4.625
The average of hour 201811270120: 4.88889
The average of hour 201811270130: 5
The average of hour 201811270140: 5
The average of hour 201811270150: 4.75
The average of hour 201811270200: 4.46154
ivo@spain-nuc-03:~/Downloads/TestStackoverflow$

You can later change the output to forward it to a file. There are more elegant ways of doing this for experienced bash users.

For Paul Hodges:

Awk is to point to the specific column in question as we don't know if that column has the same length as the rest of the file (Still Applies).

tr -d is necesarry as the value of the variable needs to be an integer and not a string (Only at command line):

This is a string:

ivo@spain-nuc-03:~/Downloads/ScriptsClientes/BashReports/Tools/TextProcessing$ cat values.txt | wc -l
13
ivo@spain-nuc-03:~/Downloads/ScriptsClientes/BashReports/Tools/TextProcessing$

This is an integer:

ivo@spain-nuc-03:~/Downloads/ScriptsClientes/BashReports/Tools/TextProcessing$ cat values.txt | wc -l | tr -d '\n'
13ivo@spain-nuc-03:

Addtionally just doing wc -l file returns the following(Still applies):

ivo@spain-nuc-03:~/Downloads/ScriptsClientes/BashReports/Tools/TextProcessing$ wc -l values.txt 
13 values.txt
ivo@spain-nuc-03:~/Downloads/ScriptsClientes/BashReports/Tools/TextProcessing$

Not at all suitable for the task at hand as it forces you to filter out the name of the file.

Please be sure before criticizing.

edited Nov 28 '18 at 19:30

answered Nov 28 '18 at 12:35

Ivo Yordanov

146
1
8

Why `tr` out newlines the parser strips anyway? [Useless use of `cat`](http://porkmail.org/era/unix/award.html#cat) - `memory=$( – Paul Hodges Nov 28 '18 at 15:56
That said, bravo for submitting an answer! Please don't stop doing that. It puts you head and shoulders above the folk who never give back. Good job. Just make a point of always improving. ;) – Paul Hodges Nov 28 '18 at 15:57
@PaulHodges tr is to remove the \n at the end of a value... When you do wc -l you get a return line at the end: – Ivo Yordanov Nov 28 '18 at 16:31
@PaulHodges I have learned from the question I opened what you showed me regarding cat and avoid using repeating some commands... This is a slow process of improvement and it takes practice... Practice makes perfect... Thanks for pointing out my mistakes but here you aren't right. – Ivo Yordanov Nov 28 '18 at 16:48
@PaulHodges Check the update on my answer for details... – Ivo Yordanov Nov 28 '18 at 16:55
@PaulHodges Regarding cat you are right! Didn't know $( – Ivo Yordanov Nov 28 '18 at 17:41
@PaulHodges One last thing. He asked for a bash script, not an awk one ;) – Ivo Yordanov Nov 28 '18 at 17:47
Again, apologies if it feels like I'm picking on you. I just want to make sure people know `length=$( wc -l < x )` doesn't require another process to strip newlines. `awk '{print $2}' x` will print a line for every line of `x` whether ithat line has a `$2` or not. I'm really not picking on you, and it's obvious that you aren't a clueless neophyte, but the newline is stripped before the assignment anyway, and `13` in bash is a token, pretty much always a string unless used in an integer context. If I hit a nerve, again, I'm sorry. – Paul Hodges Nov 28 '18 at 17:58
@PaulHodges wc -l generates a new line. Please look at the answer again and compare when tr-d is present and when it is not. The definition would work either way. In one case you are defining an integer, in the other a string. Don't worry about criticizing, I already have "striping useless cat" from all my scripts on my "to do" list thanks to you. I got on here to learn as I don't get feedback on the code I do at work. – Ivo Yordanov Nov 28 '18 at 18:06
Try this: `length=$( wc -l < x ); echo "[$x]"` The newline already gets stripped during the assignment. I understand what you are doing, but newlines are tokenizing characters unless explicitly protected. You don't have to do it for the assignment. – Paul Hodges Nov 28 '18 at 18:18
@PaulHodges for this assignment not, but it is always better to work with the proper data type in case you need to do comparisons... – Ivo Yordanov Nov 28 '18 at 18:38
bash [doesn't have data types](https://www.tldp.org/LDP/abs/html/untyped.html), though you could `declare -i length=$( wc -l < x )` to be sure. – Paul Hodges Nov 28 '18 at 18:41
@PaulHodges Numeric vs String comparissons: https://www.linuxtechi.com/compare-numbers-strings-files-in-bash-script/ Notice a difference? – Ivo Yordanov Nov 28 '18 at 18:43
@PaulHodges From tldp: https://www.tldp.org/LDP/abs/html/comparison-ops.html#EX13 Look at example Example 7-5. "# Here "a" and "b" can be treated either as integers or strings. # There is some blurring between the arithmetic and string comparisons, #+ since Bash variables are not strongly typed. # Bash permits integer operations and comparisons on variables #+ whose value consists of all-integer characters. # Caution advised, however." – Ivo Yordanov Nov 28 '18 at 18:48
These prove my point. The values are the same, the operators are different. Can you show me an example where the `tr` is required? – Paul Hodges Nov 28 '18 at 19:09
@PaulHodges It proofs my point they are not. Even in the manual of tldp they are referring to them differently: 13 - number(integer) 13\n - string You could not use any of the greater than, less than operators (-gt, -lt) that appear in the manual with a string. – Ivo Yordanov Nov 28 '18 at 19:16
But the newline *would never have been there*. That's my point. You are calling `tr` in a situation that only applies from the command line - never in an assignment. Please test it. – Paul Hodges Nov 28 '18 at 19:18
@PaulHodges Aha... I learned from Pandora FMS monitoring system to always put tr -d for the trailing \n and do things first in the command line. Another thing to put on my "to do" list... Thanks!! – Ivo Yordanov Nov 28 '18 at 19:24
It's an awesome tool to have in the box, and you *should* always test it on the command line first! :D – Paul Hodges Nov 28 '18 at 19:35

Average value in CSV files bash script

5 Answers5