How do I extract the domain name from a url using bash? like: http://example.com/ to example.com must work for any tld, not just .com
-
Dup: http://stackoverflow.com/questions/827024/how-do-i-extract-the-domain-out-of-an-url – Dennis Williamson Mar 23 '10 at 07:04
-
That is Perl, not Bash, though. – Apr 22 '10 at 00:34
-
Basically all of the answers here are broken, except bewilderingly the Ruby one. You need to know the subdomain policy of the top-level domain before you can decide which is the root domain. Look for the Public Suffic database. In very brief, you want to handle cases like `www.surrey.bbc.co.uk`, `www.nic.ad.jp`, `www.city.nagoya.jp`, etc. – tripleee Nov 14 '22 at 13:20
-
@tripleee: Posted today a [pure bash answer](https://stackoverflow.com/a/74948263/1765658) with a chapter addressing your comment! – F. Hauri - Give Up GitHub Dec 29 '22 at 10:33
16 Answers
You can use simple AWK way to extract the domain name as follows:
echo http://example.com/index.php | awk -F[/:] '{print $4}'
OUTPUT: example.com
:-)
-
Nicee, this is so much better then the answers provided in https://stackoverflow.com/questions/6174220/parse-url-in-shell-script ! – bk138 Dec 06 '14 at 01:19
-
11`echo http://example.com:3030/index.php | awk -F/ '{print $3}'` `example.com:3030` :-( – Ben Burns Mar 24 '15 at 09:16
-
you could split on `:` again to get it, but its not flexible enough to accept both with and without port. – chovy Dec 29 '15 at 03:30
-
-
What if i need this - http(s)://example.com? I tried printing $1$3 it gives this - http:example.com (missing '//' after http) any idea? – 3AK Jun 16 '16 at 05:14
-
3I got it by using this - echo `http://www.example.com/somedir/someotherdir/index.html` | cut -d'/' -f1,2,3 gives `http://www.example.com` – 3AK Jun 16 '16 at 05:44
-
7
-
@Michael If I also want to remove www but not any other subdomain (e.g., www.example.com -> example.com but home.example.com -> home.example.com)? – d-b Jun 13 '18 at 06:15
-
On MacOS it makes sense to do this: `echo http://example.com/index.php | awk -F/ '{print $3}' | awk -F: '{print $1}'` – derFunk Aug 21 '18 at 15:46
-
in case the URL contains `&` wrap it around the quotes while passing as the parameter. – Vishrant Oct 09 '20 at 01:46
-
This does not work without http or https. for example example.com/index.php/test, would return blank – MaXi32 Jul 31 '21 at 11:06
$ URI="http://user:pw@example.com:80/"
$ echo $URI | sed -e 's/[^/]*\/\/\([^@]*@\)\?\([^:/]*\).*/\2/'
example.com

- 136,138
- 45
- 251
- 267

- 487
- 4
- 3
-
3This works with or without port, deep paths and is still using bash. although it doesn't work on mac. – chovy Dec 29 '15 at 03:34
-
-
2I use your suggestion with a little extra to strip out any subdomains that might be in the url ->> `echo http://www.mail.example.com:3030/index.php | sed -e "s/[^/]*\/\/\([^@]*@\)\?\([^:/]*\).*/\2/" | awk -F. '{print $(NF-1) "." $NF}'` so I basically cut your output at the dot and take the last & second to last column and patch them back with the dot. – sakumatto Nov 01 '17 at 14:33
-
**This is the best answer!** I used this for a ping command that allows full URLs: https://unix.stackexchange.com/a/428990/20661 stripping only the `www.` subdomain – rubo77 Mar 08 '18 at 10:52
-
1For those who want to get the port: `sed -e "s/[^/]*\/\/\([^@]*@\)\?\([^:/]*\)\(:\([0-9]\{1,5\}\)\)\?.*/\4/"` – wheeler Apr 26 '18 at 23:38
-
1@sakumatto works fine, but how would it be to support "https://example.com.uk" for example? – sanNeck Apr 15 '21 at 17:11
basename "http://example.com"
Now of course, this won't work with a URI like this: http://www.example.com/index.html
but you could do the following:
basename $(dirname "http://www.example.com/index.html")
Or for more complex URIs:
echo "http://www.example.com/somedir/someotherdir/index.html" | cut -d'/' -f3
-d means "delimiter" and -f means "field"; in the above example, the third field delimited by the forward slash '/' is www.example.com.

- 4,192
- 4
- 22
- 24
-
5
-
1fails if you add a port: `echo "http://www.example.com:8080/somedir/someotherdir/index.html" | cut -d'/' -f3` – chovy Dec 29 '15 at 03:31
-
got this - `http://www.example.com` by running - echo `http://www.example.com/somedir/someotherdir/index.html | cut -d'/' -f1,2,3` – 3AK Jun 16 '16 at 05:49
-
`basename $(dirname` does not work, if the url ends with the domain like: `basename $(dirname "http://www.example.com/")` will show just: `http:` – rubo77 Mar 08 '18 at 10:37
echo $URL | cut -d'/' -f3 | cut -d':' -f1
Works for URLs:
http://host.example.com
http://host.example.com/hi/there
http://host.example.com:2345/hi/there
http://host.example.com:2345

- 4,423
- 2
- 21
- 18
-
1I found this more useful as it would return the url as it is when it doesn't contain 'http://' i.e. `abc.com` will be retained as `abc.com` – Udayraj Deshmukh Nov 05 '18 at 08:16
-
This is in fact the most intuitive, concise and effective method of all the answers here! – Robert Aug 15 '21 at 14:22
-
1This extracts `host.example.com` rather than the domain name (`example.com`) asked for. – Lucas Apr 05 '22 at 19:19
sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_'
e.g.
$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com'
example.com
$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'https://example.com'
example.com
$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com:1234/some/path'
example.com
$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path'
example.com
$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path#fragment'
example.com
$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path#fragment?params=true'
example.com

- 23,463
- 20
- 90
- 119
-
Boom! `HOST=$(sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< "$MYURL")` is fine in Bash – 4Z4T4R May 26 '17 at 17:58
-
I would like to crop www from domain. In this case, how should I change the command properly? – Ceylan B. Apr 25 '19 at 08:22
-
thanks for this, very handy, to capture path from URL I extend this slightly `sed -E -e 's_.*://([^/@]*@)?([^/:]+)(.*)_\2_' <<< 'http://example.com'` this allow you to grab path from url sed -E -e 's_.*://([^/@]*@)?([^/:]+)(.*)_\3_' <<< 'http://example.com/path/to/something' – Max Barrass May 05 '22 at 03:53
#!/usr/bin/perl -w
use strict;
my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]+\.[^\/]+)/g) {
print $2;
}
Usage:
./test.pl 'https://example.com'
example.com
./test.pl 'https://www.example.com/'
www.example.com
./test.pl 'example.org/'
example.org
./test.pl 'example.org'
example.org
./test.pl 'example' -> no output
And if you just want the domain and not the full host + domain use this instead:
#!/usr/bin/perl -w
use strict;
my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+\.[^\/]+)/g) {
print $3;
}

- 1,289
- 2
- 9
- 20
-
Of course the last one doesn't know about "www.example.co.uk" http://search.cpan.org/~nmelnick/Domain-PublicSuffix-0.04/lib/Domain/PublicSuffix.pm – Dennis Williamson Mar 23 '10 at 07:03
-
True, and if there is an API for it obviously I'd go with that anyway. Seems like the complete solution would actually have to know all valid country codes and check to see if the last post-dot region was a country code... – Dark Castle Mar 23 '10 at 13:56
Instead of using regex to do this you can use python's urlparse:
URL=http://www.example.com
python -c "from urlparse import urlparse
url = urlparse('$URL')
print url.netloc"
You could either use it like this or put it in a small script. However this still expects a valid scheme identifier, looking at your comment your input doesn't necessarily provide one. You can specify a default scheme, but urlparse expects the netloc to start with '//'
:
url = urlparse('//www.example.com/index.html','http')
So you will have to prepend those manually, i.e:
python -c "from urlparse import urlparse
if '$URL'.find('://') == -1 then:
url = urlparse('//$URL','http')
else:
url = urlparse('$URL')
print url.netloc"

- 416
- 3
- 4
3 answers: short URL parsing (shell+bash) and full TLD extractor
Remark about question:
Question stand for regex, but the goal there is to split string on /
character!! XY problem, using regex for this kind of job is overkill!
Posix shell first
Instead of using forks to another binaries, like awk
, perl
, cut
or else, we could use parameter expansions which is quicker:
URL="http://example.com/some/path/to/page.html"
prot="${URL%%:*}"
link="${URL#$prot://}"
domain="${link%%/*}"
link="${link#$domain}"
printf '%-8s: %s\n' Protocol "${prot%:}" Domain "$domain" Link "$link"
Protocol: http
Domain : example.com
Link : /some/path/to/page.html
Note: This work even with file
URL:
URL=file:///tmp/so/test.xml
prot="${URL%%:*}"
link="${URL#$prot://}"
domain="${link%%/*}"
link="${link#$domain}"
printf '%-8s: %s\n' Protocol "${prot%:}" Domain "$domain" Link "$link"
Protocol: file
Domain :
Link : /tmp/so/test.xml
read
url parts using bash
As this question is tagged bash and no answer address read
short, quick and reliable solution:
URL="http://example.com/some/path/to/page.html"
IFS=/ read -r prot _ domain link <<<"$URL"
That's all. As read is a builtin, this is the quickest way!! (** See comment)
From there you could
printf '%-8s: %s\n' Protocol "${prot%:}" Domain "$domain" Link "/$link"
Protocol: http
Domain : example.com
Link : /some/path/to/page.html
You could even check for port:
URL="http://example.com:8000/some/path/to/page.html"
IFS=/ read -r prot _ domain link <<<"$URL"
IFS=: read -r domain port <<<"$domain"
printf '%-8s: %s\n' Protocol "${prot%:}" Domain "$domain" Port "$port" Link "/$link"
Protocol: http
Domain : example.com
Port : 8000
Link : /some/path/to/page.html
Full parsing with default ports:
URL="https://stackoverflow.com/questions/2497215/how-to-extract-domain-name-from-url"
declare -A DEFPORTS='([http]=80 [https]=443 [ipp]=631 [ftp]=21)'
IFS=/ read -r prot _ domain link <<<"$URL"
IFS=: read -r domain port <<<"$domain"
printf '%-8s: %s\n' Protocol "${prot%:}" Domain "$domain" \
Port "${port:-${DEFPORTS[${prot%:}]}}" Link "/$link"
Protocol: https
Domain : stackoverflow.com
Port : 443
Link : /questions/2497215/how-to-extract-domain-name-from-url
Full Top Level Domain extractor (in pure bash):
Regarding public suffix and @tripleee' comment
There is one fork to wget
which is done only once, at function initialization:
declare -A TLD='()'
initTld () {
local tld
while read -r tld; do
[[ -n ${tld//*[ \/;*]*} ]] && TLD["${tld#\!}"]=''
done < <(
wget -qO - https://publicsuffix.org/list/public_suffix_list.dat
)
}
tldExtract () {
if [[ $1 == -v ]] ;then local _tld_out_var=$2;shift 2;fi
local dom tld=$1 _tld_out_var
while [[ ! -v TLD[${tld}] ]] && [[ -n $tld ]]; do
IFS=. read -r dom tld <<< "$tld"
done
if [[ -v _tld_out_var ]] ;then
printf -v $_tld_out_var '%s %s' "$dom" "$tld"
else
echo "$dom $tld"
fi
}
initTld ; unset -f initTld
Then
tldExtract www.stackoverflow.com
stackoverflow com
tldExtract sub.www.test.co.uk
test co.uk
tldExtract -v myVar sub.www.test.co.uk
echo ${myVar% *}
test
echo ${myVar#* }
co.uk
tldExtract -v myVar www2.sub.city.nagoya.jp
echo $myVar
sub city.nagoya.jp

- 64,122
- 17
- 116
- 137
-
Quicker function: `parseUrl() { local IFS=/ arry;arry=($4);printf -v $1 ${arry%:};printf -v $2 ${arry[2]};printf -v $3 "/${arry[*]:3}";}` to be used as `read` replacment: `parseUrl prot domain link "$URL"` for populating `$prot $domain` and `$link` variuables – F. Hauri - Give Up GitHub Mar 23 '23 at 11:43
there is so little info on how you get those urls...please show more info next time. are there parameters in the url etc etc... Meanwhile, just simple string manipulation for your sample url
eg
$ s="http://example.com/index.php"
$ echo ${s/%/*} #get rid of last "/" onwards
http://example.com
$ s=${s/%\//}
$ echo ${s/#http:\/\//} # get rid of http://
example.com
other ways, using sed(GNU)
$ echo $s | sed 's/http:\/\///;s|\/.*||'
example.com
use awk
$ echo $s| awk '{gsub("http://|/.*","")}1'
example.com

- 327,991
- 56
- 259
- 343
-
Your method doesn't work! echo http://example.com/index.php | sed -r 's/http:\/\/|\///g' gives output example.comindex.php and NOT example.com on cygwin. please post a method that works – Ben Smith Mar 23 '10 at 03:11
-
3my method doesn't work because your sample url is different !! and you did not provide more info on what type of urls you want to parse !!. you should write your question clearly providing input examples and describe what output you want next time! – ghostdog74 Mar 23 '10 at 03:31
-
2nd line seems to be incorrect. I copypasted the 2 first lines to my ubuntu shell and got _http://example.com/index.php*_ – jpeltoniemi Jun 25 '12 at 16:58
The following will output "example.com":
URI="http://user@example.com/foo/bar/baz/?lala=foo"
ruby -ruri -e "p URI.parse('$URI').host"
For more info on what you can do with Ruby's URI class you'd have to consult the docs.

- 66,324
- 14
- 138
- 158
One solution that would cover for more cases would be based on sed regexps:
echo http://example.com/index.php | sed -e 's#^https://\|^http://##' -e 's#:.*##' -e 's#/.*##'
That would work for URLs like:
http://example.com/index.php, http://example.com:4040/index.php, https://example.com/index.php
Please note that extracting domain-name only from a URL is a bit tricky because domain name place in the hostname depends on the country (or more generally on the TLD) being used.
eg. for Argentina: www.personal.com.ar Domain name is personal.com.ar, not com.ar because this TLD uses subzones to specify type of organization.
The tool that I've found to manage well these cases is tldextract
So based on the FQDN (host part of the URL), you would get the domain reliably this way:
tldextract personal.com.ar | cut -d " " -f 2,3 | sed 's/ /./'
(the other answers to get the FQDN out of the URL are good and should be used)
hope this helps :) and thanks to tripleee !

- 2,637
- 1
- 17
- 11
-
There is no "above" or "below"; your answer could be first or last or in the middle depending on each visitor's display preferences. This is not a "corner case" but rather a central case where some popular global TLDs are common but actually the corner case. Nevertheless, +1 – tripleee Dec 29 '22 at 11:05
With Ruby you can use the Domainatrix library / gem
http://www.pauldix.net/2009/12/parse-domains-from-urls-easily-with-domainatrix.html
require 'rubygems' require 'domainatrix' s = 'http://www.champa.kku.ac.th/dir1/dir2/file?option1&option2' url = Domainatrix.parse(s) url.domain => "kku"
great tool! :-)

- 1
Here's the node.js way, it works with or without ports and deep paths:
//get-hostname.js
'use strict';
const url = require('url');
const parts = url.parse(process.argv[2]);
console.log(parts.hostname);
Can be called like:
node get-hostname.js http://foo.example.com:8080/test/1/2/3.html
//foo.example.com

- 72,281
- 52
- 227
- 295
Pure Bash implementation without any sub-shell or sub-process:
# Extract host from an URL
# $1: URL
function extractHost {
local s="$1"
s="${s/#*:\/\/}" # Parameter Expansion & Pattern Matching
echo -n "${s/%+(:*|\/*)}"
}
E.g. extractHost "docker://1.2.3.4:1234/a/v/c"
will output 1.2.3.4

- 2,115
- 2
- 12
- 9
Using bash built-in regex (no external utilities needed):
#!/usr/bin/env bash
url=https://stackoverflow.com/questions/2497215/how-to-extract-domain-name-from-url
if [[ $url =~ ^(https?://[^/]+) ]]; then
host="${BASH_REMATCH[1]}"
echo "HOST: $host"
else
echo "Invalid URL $url"
exit 1
fi
# OUTPUT
# HOST: https://stackoverflow.com
See also:

- 28,968
- 18
- 162
- 169