0

When building an R package the command outputs the process steps to std out. From that output I would like to capture the final name of the package.

In the simulated script below I show the output of the build command. The part that needs to be captured is the last line starting with building.

How do I get the regex to match with these quotes, and then capture the package name into a variable?

#!/usr/bin/env bash

var=$(cat <<"EOF"
Warning message:
* checking for file ‘./DESCRIPTION’ ... OK
* preparing ‘analysis’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
Removed empty directory ‘analysis/.idea/inspectionProfiles’
Removed empty directory ‘analysis/.idea/snapshots’
* creating default NAMESPACE file
* building ‘analysis_0.1.tar.gz’
EOF
)

regex="building [\u2018](.*?)?[\u2019]"

if [[ "${var}" =~ $regex ]]; then
  pkgname="${BASH_REMATCH[1]}"
  echo "${pkgname}"
else
  echo "sad face"
fi

This should work on both macOS and CentOS.

katahdin
  • 389
  • 7
  • 16
  • 2
    You have double `regex` declarations. Why? Have you just tried `regex='building ‘([^’]*)’` ([demo](https://ideone.com/IM69dc))? – Wiktor Stribiżew Mar 15 '19 at 12:29
  • [Parameter Substitution](https://www.tldp.org/LDP/abs/html/parameter-substitution.html) can obtain the package name too ([demo](https://ideone.com/IdzE4Z)) – RobC Mar 16 '19 at 20:52

2 Answers2

1

There are many ways to do it, this is one:

file=`echo "$var" | grep '^\* building' | grep -o '‘.*’' | head -c -4 | tail -c +4`
echo $file
  • Find the line starting with * building (first grep)
  • Find the text between ‘’ (second grep)
  • Discard the quotes (first 4 bytes and last 4 bytes) (head and tail)
brunorey
  • 2,135
  • 1
  • 18
  • 26
1

Support for the \u and \U unicode escapes was introduced in Bash 4.2. CentOS 7 has Bash 4.2, so this should work on that platform:

regex=$'.*building[[:space:]]+\u2018(.*)\u2019'

Unfortunately, earlier versions of CentOS had older versions of Bash, and I believe the default version of Bash on MacOS is still 3.2. For those, assuming that the quotes are encoded as UTF-8, this should work:

regex=$'.*building[[:space:]]+\xe2\x80\x98(.*)\xe2\x80\x99'

If the quotes are encoded in different ways on different platforms, then you could use alternation (e.g. (\xe2\x80\x98|...) instead of xe2\x80\x98) to match all of the possibilities (and adjusting the index used for BASH_REMATCH).

See How do you echo a 4-digit Unicode character in Bash? for more information about Unicode in Bash.

I've used $'...' to set the regular expression because it supports \x and (from Bash 4.2) \u escapes for characters, and Bash regular expressions don't.

With regard to the regular expression:

pjh
  • 6,388
  • 2
  • 16
  • 17