trimming www. and http:// best practices

Question

my code works, i just want to know if its a bad practise, because i suppose so. Ive tryed all the preg_replace but it didnt seem to work. So i just wrote it like this.

As an imput I expect url

google.com www.google.com http://google.com

or

http://www.google.com

as a result I need

google.com

my code:

 $website = trim($website); //removes space characters
                        $website = trim($website, '/');
                        $website = trim($website, 'http://');
                        $website = trim($website, 'www.');

http://stackoverflow.com/questions/6738752/regex-for-dropping-http-and-www-from-urls — aebersold, Dec 30 '13 at 15:51
Contrary to popular believe (esp. among Management) the `www.` string is not the standard protocol prefix for web sites. That'd be `http:` and `https:`. Stripping www blindly can eventually just break the URL — Álvaro González, Jan 03 '17 at 12:45

h2ooooooo · Accepted Answer · 2013-12-30T15:58:11.803

The way trim works is that it trims each individual character (www. is the same as .w).

You're looking for preg_replace with a regex of ^(https?://)?(www\.)?:

$website = preg_replace('~^(https?://)?(www\.)?~i', '', $website);

Regular expression visualization

Debuggex Demo

Autopsy:

^ the match MUST start with whatever comes after this (makes sure that we only replace if the match is in the start)
(https?://)?
- http - the literal string http
- s? - an optional s (in case we use https)
- :// - the literal string ://
- ? - makes the whole thing optional
(www\.)?
- www\. - the literal string www. (you need to escape the . to \. as . means "any character")
- ? - makes the whole thing optional
i - this is the modifier, and i makes the whole thing in case sensitive (will match HTTP and http)

Regex 101 Demo

wow, really well explained. Thank you. It works :) – user1505027 Dec 30 '13 at 15:57 — user1505027, Dec 30 '13 at 15:57

score 2 · Answer 2 · edited May 23 '17 at 11:46

2

KIS: Keep It Simple.

http://www.php.net/parse_url

From the docs:

<?php
$url = 'http://username:password@hostname/path?arg=value#anchor';

print_r(parse_url($url));

echo parse_url($url, PHP_URL_PATH);
?>

Array
(
    [scheme] => http
    [host] => hostname
    [user] => username
    [pass] => password
    [path] => /path
    [query] => arg=value
    [fragment] => anchor
)

EDIT: PHP Getting Domain Name From Subdomain When you have the host.

edited May 23 '17 at 11:46

Community

1
1

answered Dec 30 '13 at 15:56

Anyone

2,814
1
22
27

Even if you used `parse_url($url, PHP_URL_HOST)` (I have no clue why you use `path`), you'd still get `www.google.com` which is not what OP wants. [DEMO](http://codepad.org/ETWsQ4yQ). – h2ooooooo Dec 30 '13 at 16:00
But it's a lot easier to obtain the actual domain from this. – Anyone Dec 30 '13 at 16:08

score 0 · Answer 3 · answered Dec 30 '13 at 15:52

0

Classic use case for RegEx. This snippet removes http(s) and www prefixes.

$new_url = preg_replace('/(?:https?://)?(?:www.)?(.*)/?$/i', '$1', $url);

answered Dec 30 '13 at 15:52

aebersold

11,286
2
20
29

trimming www. and http:// best practices

3 Answers3