Shell script to split a url and extract the variables

Question

I have a string like this "git@github.com:myOrg/my-repo.git", I am trying to split the url string and get the substrings "myOrg" and "my-repo".

I tried below script but it is failing


REPO=$(echo $SSH_URL | sed  -n 's#.*:\\(.*\\)/.*#\\1#p')

GIT_ORG=$(echo $SSH_URL | sed -n 's/^.*:\\([^\\/]*\\)\\/\\(.*\\)\\.git$/\\1/p')

but I am getting below error

sed: -e expression #1, char 39: unknown option to s

can someone please help

As the [tag:bash] tag you used says "For shell scripts with syntax or other errors, please check them at https://shellcheck.net before posting them here." and also please read [correct-bash-and-shell-script-variable-capitalization](https://stackoverflow.com/questions/673055/correct-bash-and-shell-script-variable-capitalization). Once you've fixed your code based on that, if you still have problems [edit] your question to show the updated code and let us know. — Ed Morton, May 22 '23 at 12:02

score 3 · Accepted Answer · answered May 22 '23 at 12:14

3

You could use parameter expansion to cut off the unneeded parts:

url="${SSH_URL}"   # make a copy
url="${url#*:}"    # drop until colon
url="${url%.git}"  # drop extension

owner="${url%/*}"  # extract "myOrg"
repo="${url#*/}"   # extract "my-repo"

answered May 22 '23 at 12:14

pmf

24,478
2
22
31

Fravadona · Answer 2 · 2023-05-22T22:14:15.663

3

You could use the read builtin: (edit: added the stripping of .git with a parameter expansion)

#!/bin/bash

SSH_URL=git@github.com:myOrg/my-repo.git

IFS=':/' read -r _ git_org git_repo <<< "${SSH_URL%.git}"

$ echo "$git_org"
myOrg
$ echo "$git_repo"
my-repo

edited May 22 '23 at 22:14

answered May 22 '23 at 12:15

Fravadona

13,917
1
23
35

score 3 · Answer 3 · answered May 22 '23 at 12:34

Use a regular expression.

[[ $SSH_URL =~ git@github.com:(.*)/(.*)\.git ]]
REPO=${BASH_REMATCH[2]}
GIT_ORG=${BASH_REMATCH[1]}

The first element (0) of the BASH_REMATCH array is the entire matched string; subsequent elements are the contents of any capture groups in the regular expression, from left to right.

markp-fuso · Answer 4 · 2023-05-22T19:16:40.650

While the other answers are going to be more efficient (ie, they don't incur the overhead of spawning 4x subshells), some considerations re: OP's current sed solution:

in the 1st sed script the # is used as the script delimiter
in the 2nd sed script the / is used as the script delimiter
the error is being generated by the 2nd sed script because the / also shows up in the data (ie, sed can't distinguish between a / used as a delimiter vs. a / as part of the data)
try using # as the script delimiter in the 2nd sed command to eliminate the error message

As for the current regexes, this may be easier to address if we enable extended regex support (-E or -r), eg:

$ echo "$SSH_URL" | sed -nE 's#^.*:([^/]*)/.*$#\1#p'
myOrg

$ echo "$SSH_URL" | sed -nE 's#^.*/([^\.]*)\..*$#\1#p'
my-repogit

Eliminating the pipe/subshell with a here-string (<<< "$var"):

$ sed -nE 's#^.*:([^/]*)/.*$#\1#p' <<< "$SSH_URL"
myOrg

$ sed -nE 's#^.*/([^\.]*)\..*$#\1#p' <<< "$SSH_URL"
my-repo

Pulling all of this into OP's current code:

$ REPO=$(sed -nE 's#^.*:([^/]*)/.*$#\1#p' <<< "$SSH_URL")
$ GIT_ORG=$(sed -nE 's#^.*/([^\.]*)\..*$#\1#p' <<< "$SSH_URL")

$ typeset -p REPO GIT_ORG
declare -- REPO="myOrg"
declare -- GIT_ORG="my-repo"

NOTES:

the $( ... ) construct will still require a subshell to be spawned (2 total in this case)
consider getting into the habit of using lower-cased variable names (eg, ssh_url, repo and git_org) to minimize the (future) chance of overwriting system variables (eg, PWD, HOME, PATH)

Shell script to split a url and extract the variables

4 Answers4