2

I'm working with the following output:

=============================== Coverage summary ===============================
Statements   : 26.16% ( 1681/6425 )
Branches     : 6.89% ( 119/1727 )
Functions    : 23.82% ( 390/1637 )
Lines        : 26.17% ( 1680/6420 )
================================================================================

I would like to parse the 4 coverage percentage numbers without the percent via REGEX, into a comma separated list.

Any suggestions for a good regex expression for this? Or another good option?

StephenKing
  • 36,187
  • 11
  • 83
  • 112
TheJeff
  • 3,665
  • 34
  • 52
  • 2
    kindly post the expected output too in your post. – RavinderSingh13 Aug 01 '18 at 03:33
  • 2
    Hi Jeff. These _all_ seem like good answers, except maybe the `bash` one, and people have put work into them. I think the `grep` one is the clearest, better than my `sed`. Anyway, I am voting them all up. – Joseph Quinsey Aug 03 '18 at 02:31

5 Answers5

3

The sed command:

  sed -n '/ .*% /{s/.* \(.*\)% .*/\1/;p;}' input.txt | sed ':a;N;$!ba;s/\n/,/g'

gives the output:

  26.16,6.89,23.82,26.17

Edit: A better answer, with only a single sed, would be:

  sed -n '/ .*% /{s/.* \(.*\)% .*/\1/;H;};${g;s/\n/,/g;s/,//;p;}' input.txt

Explanation:

  • / .*% / search for lines with a percentage value (note spaces)
  • s/.* \(.*\)% .*/\1/ and delete everything except the percentage value
  • H and then append it to the hold space, prefixed with a newline

  • $ then for the last line

  • g get the hold space
  • s/\n/,/g replace all the newlines with commas
  • s/,// and delete the initial comma
  • p and then finally output the result

To harden the regex, you could replace the search for the percentage value .*% with for example [0-9.]*%.

Joseph Quinsey
  • 9,553
  • 10
  • 54
  • 77
2

I think this is a grep job. This should help:

$ grep -oE "[0-9]{1,2}\.[0-9]{2}" input.txt | xargs | tr " " ","

Output:

26.16,6.89,23.82,26.17

The input file just contains what you have shown above. Obviously, there are other ways like cat to feed the input to the command.

Explanation:

  • grep -oE: only show matches using extended regex
  • xargs: put all results onto a single line
  • tr " " ",": translate the spaces into commas:

This is actually a nice shell tool belt example, I would say.


Including the consideration of Joseph Quinsey, the regex can be made more robust with a lookahead to assert a % sign after then numeric value using a Perl-compatible RE pattern:

grep -oP "[0-9]{1,2}\.[0-9]{2}(?=%)" input.txt | xargs | tr " " ","
wp78de
  • 18,207
  • 7
  • 43
  • 71
  • Maybe you could add a `%` to the end of your RE, to make it more robust, and then remove the `%` with `tr`? – Joseph Quinsey Aug 03 '18 at 02:43
  • 1
    @JosephQuinsey good idea. I used a PCRE with a lookahead to do so. – wp78de Aug 03 '18 at 03:08
  • Thank you @wp78de, however it has a bit of trouble when it rounds up to one decimal place - I.E Branches 8.1% ( 140/1729 ), so using "[0-9]{1,2}\.[0-9]{1,2}(?=%)" did the trick. Thanks! – TheJeff Aug 12 '18 at 10:16
  • Bonus: https://stackoverflow.com/questions/51807953/parse-comma-separated-string-of-numbers-into-variables-scripting-bash – TheJeff Aug 12 '18 at 10:27
2

Would you consider to use awk? Here's the command you may try,

$ awk 'match($0,/[0-9.]*%/){s=(s=="")?"":s",";s=s substr($0,RSTART,RLENGTH-1)}END{print s}' file
26.16,6.89,23.82,26.17

Brief explanation,

  • match($0,/[0-9.]*%/): find the record matched with regex [0-9.]*%
  • s=(s=="")?"":s",": since comma separated is required, we just need print commas before each matched except the first one.
  • s=s substr($0,RSTART,RLENGTH-1): print the matched part appended to s
CWLiu
  • 3,913
  • 1
  • 10
  • 14
1

Assuming the item names (Statements, Branches, ...) do not contain whitespaces, how about:

#!/bin/bash

declare -a keys
declare -a vaues

while read -r line; do
    if [[ "$line" =~ ^([^\ ]+)\ *:\ *([0-9.]+)% ]]; then
        keys+=(${BASH_REMATCH[1]})
        values+=(${BASH_REMATCH[2]})
    fi
done < output.txt

ifsback=$IFS        # backup IFS
IFS=,
echo "${keys[*]}"
echo "${values[*]}"
IFS=$ifsback        # restore IFS

which yields:

Statements,Branches,Functions,Lines
26.16,6.89,23.82,26.17
tshiono
  • 21,248
  • 2
  • 14
  • 22
1

Yet another option, with perl:

cat the_file | perl -e 'while(<>){/(\d+\.\d+)%/ and $x.="$1,"}chop $x; print $x;'

The code, unrolled and explained:

while(<>){  # Read line by line. Put lines into $_
  /(\d+\.\d+)%/ and $x.="$1,"
  # Equivalent to:
  # if ($_ =~ /(\d+\.\d+)%/) {$x.="$1,"}
  # The regex matches "numbers", "dot", "numbers" and "%", 
  # stores just numbers on $1 (first capturing group)
}
chop $x; # Remove extra ',' and print result
print $x;

Somewhat shorter with an extra sed

cat the_file | perl -ne '/(\d+\.\d+)%/ and print "$1,"'|sed 's/.$//'

Uses "n" parameter which implies while(<>){}. For removing the last ',' we use sed.

Julio
  • 5,208
  • 1
  • 13
  • 42