-1

For example if I have https://stackoverflow.com/questions/ask I'd like to cut it to stackoverflow.com/questions/ask or if I have http://www.samsung.com/au/ I'd like to cut it to samsung.com/au/.

I want to make a template tag for this but not sure what to return:

def clean_url(url):
    return ?

template

{{ url|clean_url }}

Any idea?

Zorgan
  • 8,227
  • 23
  • 106
  • 207

3 Answers3

2

Here is a quick and dirty way to isolate the domain provided it starts with something//

def clean(url):
  return url.partition('//')[2].partition('/')[0]
gahooa
  • 131,293
  • 12
  • 98
  • 101
  • Or just use the quick and clean `urllib.parse.urlparse(url).netloc` :) He wanted to keep the path though, not just the domain. – Paulo Almeida Apr 14 '18 at 01:05
1

urllib.parse will do most of this for you:

import urllib.parse
def clean_url(url):
    parts = list(urllib.parse.urlsplit(url))
    parts[0]=""
    cleaned = urllib.parse.urlunsplit(parts)[2:]
    return cleaned

Note this does not cut off the "www.", but you shouldn't do that; that can be a critical part of the domain name. If you really want that, add:

if cleaned.startswith("www."):
    cleaned = cleaned[4:]
codingatty
  • 2,026
  • 1
  • 23
  • 32
0

For the use cases, you described. You can just split on the double backslash and go with that or work from there.

def clean_url(url):
    clean = url.split('//')[1]
    if clean[0:4] == 'www.':
        return clean[4:]
    return clean

However, because the subdomain (such as 'www') can be used as a significant part of the url, you may want to keep that in. For example, www.pizza.com and pizza.com could be links to different pages.

Other things to consider are the urlparse library or regex but they may be overkill for this.

Zev
  • 3,423
  • 1
  • 20
  • 41