-1

This is the sample input (the data has user-IDs and the number of hours spent by the user):

Computer ID,User ID,M,T,W,T,F
Computer1,User3,5,7,3,5,2
Computer2,User5,8,8,8,8,8
Computer3,User4,0,8,0,8,4
Computer4,User1,5,4,5,5,8
Computer5,User2,9,8,10,0,0

I need to read the data, find all User-IDs ending in even numbers (2,4,6,8..) and find average number of hours spent (over five days).

I wrote the following script:

hoursarray=(0,0,0,0,0)
while IFS=, read -r col1 col2 col3 col4 col5 col6 col7 || [[ -n $col1 ]]
do
    if [[ $col2 == *"2" ]]; then
        #echo "$col2"
        ((hoursarray[0] = col3 + col4 + col5 + col6 + col7))
    elif  [[ $col2 == *"4" ]]; then 
        #echo "$col2"
        ((hoursarray[1] = hoursarray[1] + col3 + col4 + col5 + col6 + col7))
    elif [[ $col2 == *"6" ]]; then
        #echo "$col2"
        ((hoursarray[2] = hoursarray[2] + col3 + col4 + col5 + col6 + col7))
    elif [[ $col2 == *"8" ]]; then
        #echo "$col2"
        ((hoursarray[3] = hoursarray[3] + col3 + col4 + col5 + col6 + col7))
    elif [[ $col2 == *"10" ]]; then
        #echo "$col2"
        ((hoursarray[4] = hoursarray[4] + col3 + col4 + col5 + col6 + col7))
    fi
done < <(tail -n+2 user-list.txt)
echo ${hoursarray[0]}
echo "$((hoursarray[0]/5))"

This is not a very good way of doing this. Also, the numbers arent adding up correctly.

I am getting the following output (for the first one - user2):

27
5

I am expecting the following output:

27
5.4

What would be a better way to do it? Any help would be appreciated.

TIA

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
saadurr
  • 57
  • 10
  • given the sample input, please update the question with a) the (wrong) output being generated by your script and b) the (correct) desired output – markp-fuso Nov 20 '20 at 16:22
  • `hoursarray=(0,0,0,0,0)` generates an array with a single item containing the value `0,0,0,0,0` ... `typeset -p hoursarray` => `declare -a hoursarray=([0]="0,0,0,0,0")` ; replace the commas with spaces in order to generate a 5-element array ... `hoursarray=(0 0 0 0 0) ; typeset -p hoursarray` => `declare -a hoursarray=([0]="0" [1]="0" [2]="0" [3]="0" [4]="0")` ... then see if this helps any with the follow-on code – markp-fuso Nov 20 '20 at 16:25
  • 2
    This would be *way easier* with `awk` `perl` `ruby` etc – dawg Nov 20 '20 at 16:26
  • not saying I'd do it this way but `[[ $col2 == *"10" ]]` should probably be `[[ $col2 == *"0" ]]` so as to match on `*20`, `*30`, `*200`, etc – markp-fuso Nov 20 '20 at 16:28
  • Bash does not have floating point arithmetic BTW so you will never get a floating point result... – dawg Nov 20 '20 at 16:40
  • are you looking for the average for each day (ie, 5x averages) or the average across all days (ie, 1x average)?; please update the question to show the expected results for the sample output, ie, you've shown `user2` but have left out `user4` ... so what would the final results look like where you have multiple users? – markp-fuso Nov 20 '20 at 16:42
  • Does this answer your question? [Floating point results in Bash integer division](https://stackoverflow.com/questions/15015809/floating-point-results-in-bash-integer-division) – dawg Nov 20 '20 at 17:06

4 Answers4

0

You issue is echo "$((hoursarray[0]/5))" Bash does not have floating point, so it returns the integer portion only.

Easy to demonstrate:

$ hours=27
$ echo "$((hours/5))"
5

If you want to stick to Bash, you could use bc for the floating point result:

$ echo "$hours / 5.0" | bc -l
5.40000000000000000000

Or use awk, perl, python, ruby etc.

Here is an awk you can parse out. Easily modified to you use (which is a little unclear to me)

awk -F, 'FNR==1{print $2; next} 
     {arr[$2]+=($3+$4+$5+$6+$7) }   
     END{ for (e in arr) print e "\t\t" arr[e] "\t" arr[e]/5 }' file 

Prints:

User ID
User1       27  5.4
User2       27  5.4
User3       22  4.4
User4       20  4
User5       40  8

If you only want even users, filter for User that end in any of 0,2,4,6,8:

awk -F, 'FNR==1{print $2; next} 
         $2~/[24680]$/ {arr[$2]+=($3+$4+$5+$6+$7) } 
         END{ for (e in arr) print e "\t\t" arr[e] "\t" arr[e]/5 }' file

Prints:

User ID
User2       27  5.4
User4       20  4
dawg
  • 98,345
  • 23
  • 131
  • 206
  • Thanks! But the script needs to print averages for only IDs ending in even numbers (User2, User4, ....). – saadurr Nov 20 '20 at 16:54
0

Your description is fairly imprecise, but here's an attempt primarily based on the sample output:

awk -F, '$2~/[24680]$/{for(i=3;i<=7;i++){a+=$i};print a;printf "%.2g\n",a/5; a=0}' file 
20
4
27
5.4

$2~/[24680]$/ makes sure we only look at "even" user-IDs.

for(i=3;i<=7;i++){} iterates over the day columns and adds them.

Edit 1: Accommodating new requirement:

awk -F, '$2~/[24680]$/{for(i=3;i<=7;i++){a+=$i};printf "%s\t%.2g\n",$2,a/5;a=0}' saad 
User4   4
User2   5.4
tink
  • 14,342
  • 4
  • 46
  • 50
  • Is there any way that only user-ids ending in an even number can be printed only? – saadurr Nov 20 '20 at 16:52
  • I did that ... why do you think it doesn't do that? the output you're seeing is only for user 2 and 4 – tink Nov 20 '20 at 16:53
  • I tried to run the script. Its printing out averages for all user-ids – saadurr Nov 20 '20 at 16:56
  • How can I modify it to print in `userID averageOfUserID` format? – saadurr Nov 20 '20 at 16:56
  • No, it doesn't. Not with your sample data; if reality differs from the sample fix the sample. And yes, that's easy; but again, it doesn't match the "expected output" in your question. – tink Nov 20 '20 at 16:57
  • One thing, can you explain this part a bit more? `$2~/[24680]$/` – saadurr Nov 20 '20 at 17:04
  • Sure ... `$2` is the column that holds the user ID. `~` is a regular expression match. `[24680]$` is the regular expression. What all together means is: only process lines where the text in the 2nd column ends in an even number `$` means "end of field". – tink Nov 20 '20 at 17:07
0

Sample data showing userIDs with even and odd endings, userID showing up more than once (eg, User2), and some non-integer values:

$ cat user-list.txt
Computer ID,User ID,M,T,W,T,F
Computer1,User3,5,7,3,5,2
Computer2,User5,8,8,8,8,8
Computer3,User4,0,8,0,8,4
Computer4,User1,5,4,5,5,8
Computer5,User2,9,8,10,0,0
Computer5,User120,9,8,10,0,0
Computer5,User2,4,7,12,3.5,1.5

One awk solution to find total hours plus averages, across 5x days, with duplicate userIDs rolled into a single set of numbers, but limited to userIDs that end in an even number:

$ awk -F',' 'FNR==1 { next } $2 ~ /[02468]$/ { tot[$2]+=($3+$4+$5+$6+$7) } END { for ( i in tot ) { print i, tot[i], tot[i]/5 } }' user-list.txt

Where:

  • -F ',' - use comma as input field delimiter
  • FNR==1 { next } - skip first line
  • $2 ~ /[02468]$/ - if field 2 ends in an even number
  • tot[$2]+=($3+$4+$5+$6+$7) - add current line's hours to array where userID is the array index; this will add up hours from multiple input lines (for same userID) into a single array cell
  • for (...) { print ...} - loop through array indices printing the index, total hours and average hours (total divided by 5)

The above generates:

User120 27 5.4
User2 55 11
User4 20 4

Depending on OPs desired output the print can be replaced with printf and the desired format string ...

markp-fuso
  • 28,790
  • 4
  • 16
  • 36
0
 Here is your script modified a little bit:
  
 while IFS=, read -r col1 col2 col3 || [[ -n $col1 ]]
 do
       (( $(sed 's/[^[:digit:]]*//' <<<$col2) % 2 )) || ( echo -n "For $col1 $col2 average is: " && echo "($(tr , + <<<$col3))/5" | bc -l )
 done < <(tail -n+2 list.txt)

prints:

 For Computer3 User4 average is: 4.00000000000000000000
 For Computer5 User2 average is: 5.40000000000000000000