2

I am working on the following dataset (a sample can be found below) and I would like to create a bash script that allows me to select only the records that meet a set of conditions and all records fulfilling these conditions are collected in another file.

1.Third column must be greater than 3
2.Fouth column must be grater than 3.5
3.Second column must be 8
40462186,177827,7671,4395,190,4.31,0.42
2872296,273870,3492,95349,1216,1.27,9.41
45236699,265691,6874,5873,152,2.58,0.57
77481,40024,153,516565,1975,0.38,51.54

I would be grateful if you could help me to complete it.

Thank you in advance

oshiono
  • 71
  • 5

1 Answers1

0
  • You cannot include whitespaces in bash variable names.
  • You've misspelled Percentage as Percentatge.
  • You've miscouted the column position of Continent.
  • Regex operator in bash is =~, not ~.
  • You should not enclose the regex with slashes.
  • You will need to use bc or other external command for arithmetic calculation of decimal numbers.

Then would you please try the following:

#!/bin/bash

while read -r line; do
    if (( nr++ == 0 )); then            # header line
        echo "$line,diff.porc.pts"
    else                                # body
        IFS=, read _ _ _ _ Continent _ _ _ _ pDeath pSurvival <<< "$line"
        if [[ $Continent =~ ^(Africa|Asia|Europe)$ && $pDeath =~ ^(0\.[5-9]|[1-9]) && $pSurvival =~ ^([2-9]\.|[1-9][0-9]) ]]; then
            diff=$(echo "$pSurvival - $pDeath" | bc)
            echo "$line,$diff"
        fi
    fi
done < input_file.txt > new_file.txt

Output:

Country,Other names,ISO 3166-1 alpha-3 CODE,Population,Continent,Total Cases,Total Deaths,Tot Cases//1M pop,Tot Deaths/1M pop,Death percentage, Survival Percentage,diff.porc.pts
Albania,Albania,ALB,2872296,Europe,273870,3492,95349,1216,1.27,9.41,8.14

It looks the record of Albania only meets the conditions contrary to the shown desired output.

tshiono
  • 21,248
  • 2
  • 14
  • 22
  • However, a much better solution is to write a simple Awk script instead. See also [`while read` loop extremely slow compared to `cat`, why?](https://stackoverflow.com/questions/13762625/bash-while-read-loop-extremely-slow-compared-to-cat-why) – tripleee May 23 '22 at 09:51
  • @tripleee thank you for the suggestion. I strongly agree with your opinion. I was just honestly obeying the OP's requirement: `It is important that it is a bash script that uses these regular expressions in if conditions.` :) – tshiono May 23 '22 at 10:29
  • @tshiono your responses are always helpful. Could you take a look to this please? https://askubuntu.com/questions/1410054/creating-an-html-from-the-output-of-awk-script/1410091#1410091 – oshiono May 23 '22 at 12:04
  • I've posted an answer to the linked question. Hope it will be a help. Cheers. – tshiono May 24 '22 at 02:34