Split 1 argument into 2 arguments using regexp in a bash script

Question

Here's my situation. Currently, I have a script that accepts two arguments: book name and chapter name. For example:

$ myscript book1 chap1

Now, for reasons that would take a long time to explain, I would prefer my script to be able to take a single argument of the following format: {book name}.{chapter name}. For example:

$ myscript book1.chap1

The difficulty for me is that I do not know how to take a string $1=abc.xyz and turn it into two separate variables, $var1=abc and $var2=xyz. How can I do this?

smocking · Accepted Answer · 2012-07-10T16:25:36.173

18

If it's just two tags you can use a bash expression

arg=$1
beforedot=${arg%.*}
afterdot=${arg#*.}

It's faster than cut because it's a shell builtin. Note that this puts everything before the ~~first~~ last dot into beforedot and everything after into afterdot.

EDIT:

There's also a substitution/reinterpretation construct if you want to split by an arbitrary number of tokens:

string=a.b.c.d.e
tokens=(${string//\./ })

You're replacing dots by spaces and then that gets interpreted as an array declaration+definition because of the parentheses around it.

However I've found this to be less portable to bash' siblings and offspring. For example, it doesn't work in my favourite shell, zsh.

Arrays need to be dereferenced with braces and are indexed from 0:

echo "Third token: ${tokens[2]}"

You can loop through them as well by dereferencing the whole array with [@]:

for i in ${tokens[@]}
do
    # do stuff
done

edited Jul 10 '12 at 16:25

answered Jul 10 '12 at 15:02

smocking

3,689
18
22

1

What happens if the string contains more than one dot? – mouviciel Jul 10 '12 at 15:09
3

If `$arg="start.middle.end"`, then afterwards `$beforedot="start.middle"`, and `$afterdot="middle.end"`. The middle part would be duplicated. – vergenzt Jul 10 '12 at 15:35
1

@mouviciel: the section between dots winds up in both variables (e.g. if $1="a.b.c.d", it'll set $beforedot="a.b.c" and $afterdot="b.c.d"). If you use `beforedot=${arg%%.*}`, you'll get just the part before the first dot in $beforedot, which might make more sense. – Gordon Davisson Jul 10 '12 at 15:38
The name of this sort of thing is parameter expansion for those who want further reading: http://wiki.bash-hackers.org/syntax/pe – CasualScience Mar 10 '16 at 19:20

score 2 · Answer 2 · answered Jul 11 '12 at 01:38

For completeness and since you asked about a regex method:

pattern='^([^.]*)\.(.*)'
[[ $1 =~ $pattern ]]
book=${BASH_REMATCH[1]}
chapter=${BASH_REMATCH[2]}

The capture groups are elements in the BASH_REMATCH array. Element 0 contains the whole match.

This regex will capture up to the first dot in the first element. Anything after the first dot including susbsequent dots will be in the second element. The regex can be easily modified to break on the last dot if needed.

Brian Agnew · Answer 3 · 2012-07-10T15:21:21.533

1

If $arg contains book.chap

read BOOK CHAP<<<$(IFS="."; echo $arg)

will set the variables BOOK and CHAP accordingly. This uses the bash internal field separator (IFS) which controls how bash understands word boundaries. If (say) you have multiple separators in your original $arg then just specify further variables to contain the results.

From here:

$IFS defaults to whitespace (space, tab, and newline), but may be changed, for example, to parse a comma-separated data file

edited Jul 10 '12 at 15:21

answered Jul 10 '12 at 15:05

Brian Agnew

268,207
37
334
440

1

`IFS=. read -r book chapter <<<"$1"` is cleaner. No need for the `echo` or the command substitution. – Dennis Williamson Jul 11 '12 at 01:32

score 0 · Answer 4 · answered Jul 10 '12 at 15:00

0

You can use parentheses to capture the two parts; afterwards, you can use backreferences to grab them again. The syntax differs between languages; check http://www.regular-expressions.info/brackets.html for a lesson on backreferences in general.

answered Jul 10 '12 at 15:00

Palladium

3,723
4
15
19

Thanks for the info. Yes, I've done this type of grouping regex matching in Python, but I've never used regex's in bash scripts. – synaptik Jul 10 '12 at 18:11

score 0 · Answer 5 · answered Jul 10 '12 at 15:04

0

#!/bin/bash

book=${1%.*}
chapter=${1#*.}

printf 'book: %s\nchapter: %s\n' "$book" "$chapter"

answered Jul 10 '12 at 15:04

Thedward

1,432
9
8

score 0 · Answer 6 · answered Jul 10 '12 at 15:27

Pattern Subsitution with Shell Parameter Expansion

There are a lot of ways to accomplish what you're trying to do. One of the ways not covered in other answers is pattern substitution.

If you know that the value will always split correctly on a period, you can apply pattern substitution to the value so that it will be easy to tokenize with IFS. For example:

set -- foo.bar
myvar="${1/./ }"
echo $myvar

This will yield foo bar.

Split 1 argument into 2 arguments using regexp in a bash script

6 Answers6

Pattern Subsitution with Shell Parameter Expansion

Linked