12

Basically, I need to get those rows which contain domain and subdomain name from a URL or the whole website name excluding www.

My DB table looks like this:

+----------+------------------------+
|    id    |    website             |
+----------+------------------------+
| 1        | https://www.google.com |
+----------+------------------------+
| 2        | http://www.google.co.in|
+----------+------------------------+
| 3        | www.google.com         |
+----------+------------------------+
| 4        | www.google.co.in       |
+----------+------------------------+
| 5        | google.com             |
+----------+------------------------+
| 6        | google.co.in           |
+----------+------------------------+
| 7        | http://google.co.in    |
+----------+------------------------+

Expected output:

google.com
google.co.in
google.com
google.co.in
google.com
google.co.in
google.co.in

My Postgres Query looks like this:

select id, substring(website from '.*://([^/]*)') as website_domain from contacts

But above query give blank websites. So, how I can get the desired output?

m-ketan
  • 1,258
  • 2
  • 17
  • 23
Shubham Srivastava
  • 1,190
  • 14
  • 28

2 Answers2

20

You must use the "non capturing" match ?: to cope with the non "http://" websites.

like

select 
  id, 
  substring(website from '(?:.*://)?(?:www\.)?([^/?]*)') as website_domain     
from contacts;

SQL Fiddle: http://sqlfiddle.com/#!17/f890c/2/0

PostgreSQL's regular expressions: https://www.postgresql.org/docs/9.3/functions-matching.html#POSIX-ATOMS-TABLE

naXa stands with Ukraine
  • 35,493
  • 19
  • 190
  • 259
eMerzh
  • 485
  • 1
  • 4
  • 12
13

You may use

SELECT REGEXP_REPLACE(website, '^(https?://)?(www\.)?', '') from tbl;

See the regex demo.

Details

  • ^ - start of string
  • (https?://)? - 1 or 0 occurrences of http:// or https://
  • (www\.)? - 1 or 0 occurrences of www.

See the PostgreSQL demo:

CREATE TABLE tb1
    (website character varying)
;

INSERT INTO tb1
    (website)
VALUES
    ('https://www.google.com'),
    ('http://www.google.co.in'),
    ('www.google.com'),
    ('www.google.co.in'),
    ('google.com'),
    ('google.co.in'),
    ('http://google.co.in')
;

SELECT REGEXP_REPLACE(website, '^(https?://)?(www\.)?', '') from tb1;

Result:

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks. This also works for me. One more thing my friend and me working on the same project but due to C# he is using SQLite(MySQL). So Can you also help me because REGEXP_REPLACE is not supporting MySQL – Shubham Srivastava Nov 29 '17 at 07:04
  • @ShubhamSrivastava In C#, with MS SQL, you need to use an UDF to be able to use regex. See [this answer](http://stackoverflow.com/a/31693412/3832970) that should help to get started. Also, you might want to use [this script](https://sqlsharp.com/free/). – Wiktor Stribiżew Nov 29 '17 at 07:32