1

I wrote a script which populates a file as below:

nodetool="/var/opt/app/cassandra/apache-cassandra-3.11.1/bin/nodetool"

# I need to get some statistics from Cassandra using nodetool
$nodetool tablestats > tmp

# I don't need all statistic, just following info
cat tmp | grep 'Keyspace\|Pending Flushes\|Table:\|SSTable count:\|Pending flushes:\|last five minutes' > tablestats

# I needed empty line to separate between each table info
sed -i '/Maximum tombstones/a\ \n' tablestats

Which creates a tablestats and populates it as below

    Keyspace : myKeyspace
      Pending Flushes: 0
        Table: test_table_1
        SSTable count: 0
        Pending flushes: 0
        Average live cells per slice (last five minutes): NaN
        Maximum live cells per slice (last five minutes): 0
        Average tombstones per slice (last five minutes): NaN
        Maximum tombstones per slice (last five minutes): NaN

       Table: student_table
       SSTable count: 4
       Pending flushes: 2
       Average live cells per slice (last five minutes): 2
       Maximum live cells per slice (last five minutes): 5
       Average tombstones per slice (last five minutes): NaN
       Maximum tombstones per slice (last five minutes): 9

       Table: sales_table
       SSTable count: 7
       Pending flushes: 3
       Average live cells per slice (last five minutes): 3
       Maximum live cells per slice (last five minutes): 8
       Average tombstones per slice (last five minutes): 6
       Maximum tombstones per slice (last five minutes): 12

    ...

I am required to calculate following values for each Table and insert it at end of each table stats

1- Maximum tombstones / Maximum live cells

2- Average tombstones / Average live cells

I wrote following script which does the job

average_live_cells=0
maximum_live_cells=0
average_tombstones=0
maximum_tombstones=0

touch newFile

while read line; do

if [[ $line = *"Average live cells"* ]]; then
    average_live_cells=$(echo $line| cut -d':' -f 2 | xargs)

elif [[ $line = *"Maximum live cells"* ]]; then
    maximum_live_cells=$(echo $line| cut -d':' -f 2 | xargs)


elif [[ $line = *"Average tombstones"* ]]; then
    average_tombstones=$(echo $line| cut -d':' -f 2 | xargs)

elif [[ $line = *"Maximum tombstones"* ]]; then
    maximum_tombstones=$(echo $line| cut -d':' -f 2 | xargs)
fi

if [[ ! $line = *[!\ ]* ]]; then

    if [[ $maximum_live_cells -eq "NaN" || $maximum_tombstones -eq "NaN" ]] ; then
        calculated_max="NaN"
    else
        calculated_max=$(echo "scale=2 ; $maximum_tombstones / $maximum_live_cells" | bc)
    fi

    if [[ $average_live_cells -eq "NaN" || $average_tombstones -eq "NaN" ]] ; then
        calculated_ave="NaN"
    else
        calculated_ave=$(echo "scale=2 ; $average_tombstones / $average_live_cells" | bc)
    fi

    echo "average_tombstones/average_live_cells: $calculated_ave" >> newFile
    echo -e "maximum_tombstones/maximum live_cells: $calculated_max\n" >> newFile
    average_live_cells=0
    maximum_live_cells=0
    average_tombstones=0
    maximum_tombstones=0
else
    echo $line >> newFile
fi

done < tablestats

above script create a file as below

Keyspace : myKeyspace
Pending Flushes: 0
Table: test_table_1
SSTable count: 0
Pending flushes: 0
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): NaN
average_tombstones/average_live_cells: NaN
maximum_tombstones/maximum live_cells: NaN

Table: student_table
SSTable count: 4
Pending flushes: 2
Average live cells per slice (last five minutes): 2
Maximum live cells per slice (last five minutes): 5
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 9
average_tombstones/average_live_cells: NaN
maximum_tombstones/maximum live_cells: 1.80

Table: sales_table
SSTable count: 7
Pending flushes: 3
Average live cells per slice (last five minutes): 3
Maximum live cells per slice (last five minutes): 8
Average tombstones per slice (last five minutes): 6
Maximum tombstones per slice (last five minutes): 12
average_tombstones/average_live_cells: 2.00
maximum_tombstones/maximum live_cells: 1.50

...

but I think it should be a better solution rather than looping through file, Do you have better solution instead of looping through?

X625
  • 93
  • 7
  • It's unclear how you want to improve your solution. What improvement are you looking for? – Erik Elmgren Mar 27 '18 at 16:44
  • @ErikElmgren Clearly I said, Is there better solution to not looping through a file !!!! – X625 Mar 27 '18 at 16:46
  • You did not explain what you find problematic with looping through the file – Erik Elmgren Mar 27 '18 at 16:49
  • @ErikElmgren This part is very small part of bigger script which runs on two data centers with 20 nodes! and gets huge essential report, I want to minimize execution time and boost performance, One of bad practices is looping through file ! I am not bash guy, and this is the reason I ask ! – X625 Mar 27 '18 at 16:53
  • 1
    My guess is that using subshells, $(), has a bigger performance impact than looping through the entire file. You can also replace cut with the bash builtin read, see https://stackoverflow.com/questions/10586153/split-string-into-an-array-in-bash and http://rus.har.mn/blog/2010-07-05/subshells/ – Erik Elmgren Mar 27 '18 at 17:03
  • 1
    (and I suggest adding that you want to improve the *performance* as the very 1st sentence, that will help everyone who reads your question to understand) – Erik Elmgren Mar 27 '18 at 17:19

1 Answers1

0

perl has a nice mode where you can read a file a paragraph at a time:

perl -00 -lpe '
    /Average live.*: (\S+)/ and $al = $1;
    /Maximum live.*: (\S+)/ and $ml = $1;
    /Average tomb.*: (\S+)/ and $at = $1;
    /Maximum tomb.*: (\S+)/ and $mt = $1;
    $ra = ($al and $at and $al > 0 and $at > 0) ? $at/$al : "NaN";
    $rm = ($ml and $mt and $ml > 0 and $mt > 0) ? $mt/$ml : "NaN";
    $_ .= sprintf "\nratio avarage: %.2f", $ra;
    $_ .= sprintf "\nratio maximum: %.2f", $rm;
' tablestats > newFile
glenn jackman
  • 238,783
  • 38
  • 220
  • 352