Truncating the end of variables based on pattern

Question

I have a list of URLs in formats such as "www.blah.com/en-us" and I need to cut-off anything after the "www.blah.com". I've tried using the following:

import re
website = www.blah.com/en-us
cleanURL = re.sub('(.|\n)*?com', "", website)

Output: 'en-us'

So I'm getting the opposite of what I want. Sorry if this post isn't correctly formatted, first time asking a question.

Strange, when I run your code, I don't get `en-us`, I get `NameError: name 'www' is not defined`. Are you sure this is the exact code you're running? — Kevin, Jul 06 '17 at 19:30
Possibly a duplicate of https://stackoverflow.com/questions/27745/getting-parts-of-a-url-regex — Evan Wise, Jul 06 '17 at 19:32

Fulgen · Answer 1 · 2017-07-07T18:04:48.333

4

How about just using

website = "www.blah.com/en-us"
cleanURL = website.split("/",1)[0]

?

edited Jul 07 '17 at 18:04

answered Jul 06 '17 at 19:33

Fulgen

351
2
13

1

You don't need the conditional; `"www.blah.com".split("/") == ["www.blah.com"]` – chepner Jul 06 '17 at 19:36

score 2 · Answer 2 · answered Jul 06 '17 at 19:33

2

Is using regex a must? If there's no protocol (e.g. http://) in the URLs that you're trying to process, you could just use your_url_string.split('/', 1)[0] which should split on the first instance of '/' and gives you the part before the split.

answered Jul 06 '17 at 19:33

Andrew Zick

582
7
23

Truncating the end of variables based on pattern

2 Answers2