How to remove query string from a url?

Question

I have the following URL:

https://stackoverflow.com/questions/7990301?aaa=aaa
https://stackoverflow.com/questions/7990300?fr=aladdin
https://stackoverflow.com/questions/22375#6
https://stackoverflow.com/questions/22375?
https://stackoverflow.com/questions/22375#3_1

I need URLs for example:

https://stackoverflow.com/questions/7990301
https://stackoverflow.com/questions/7990300
https://stackoverflow.com/questions/22375
https://stackoverflow.com/questions/22375
https://stackoverflow.com/questions/22375

My attempt:

url='https://stackoverflow.com/questions/7990301?aaa=aaa'
if '?' in url:
    url=url.split('?')[0]
if '#' in url:
    url = url.split('#')[0]

I think this is a stupid way

Matthew Story · Accepted Answer · 2018-06-29T02:57:10.250

15

The very helpful library furl makes it trivial to remove both query and fragment parts:

>>> furl.furl("https://hi.com/?abc=def#ghi").remove(args=True, fragment=True).url
https://hi.com/

edited Jun 29 '18 at 02:57

answered Jun 29 '18 at 02:51

Matthew Story

3,573
15
26

4

Why download this library when the builtin Python way is basically exactly the same: `from urllib.parse import urlsplit, urlunsplit` then `urlunsplit(urlsplit("https://hi.com/?abc=def#ghi")._replace(query="", fragment=""))` – Boris Verkhovskiy May 08 '21 at 04:20

TheDavidFactor · Answer 2 · 2018-06-29T03:33:00.347

7

You can split on something that doesn't exist in the string, you'll just get a list of one element, so depending on your goal, you could do something like this to simplify your existing code:

url = url.split('?')[0].split('#')[0]

Not saying this is the best way (furl is a great solution), but it is a way.

edited Jun 29 '18 at 03:33

answered Jun 29 '18 at 03:08

TheDavidFactor

1,647
2
19
18

score 4 · Answer 3 · edited Oct 07 '21 at 10:55

In your example you're also removing the fragment (the thing after a #), not just the query.

You can remove both by using urllib.parse.urlsplit, then calling ._replace on the namedtuple it returns and converting back to a string URL with urllib.parse.unsplit:

from urllib.parse import urlsplit, urlunsplit

def remove_query_params_and_fragment(url):
    return urlunsplit(urlsplit(url)._replace(query="", fragment=""))

Output:

>>> remove_query_params_and_fragment("https://stackoverflow.com/questions/7990301?aaa=aaa")
'https://stackoverflow.com/questions/7990301'
>>> remove_query_params_and_fragment("https://stackoverflow.com/questions/7990300?fr=aladdin")
'https://stackoverflow.com/questions/7990300'
>>> remove_query_params_and_fragment("https://stackoverflow.com/questions/22375#6")
'https://stackoverflow.com/questions/22375'
>>> remove_query_params_and_fragment("https://stackoverflow.com/questions/22375?")
'https://stackoverflow.com/questions/22375'
>>> remove_query_params_and_fragment("https://stackoverflow.com/questions/22375#3_1")
'https://stackoverflow.com/questions/22375'

Jay Calamari · Answer 4 · 2018-06-29T02:52:19.073

You could try

urls = ["https://stackoverflow.com/questions/7990301?aaa=aaa",
"https://stackoverflow.com/questions/7990300?fr=aladdin",
"https://stackoverflow.com/questions/22375#6",
"https://stackoverflow.com/questions/22375"?,
"https://stackoverflow.com/questions/22375#3_1"]

urls_without_query = [url.split('?')[0] for url in urls]

for example, "https://stackoverflow.com/questions/7990301?aaa=aaa".split() returns a list that looks like ["https://stackoverflow.com/questions/7990301", "aaa=aaa"], and if that string is url, url.split('?')[0] would give you "https://stackoverflow.com/questions/7990301".

Edit: I didn't think about # arguments. The other answers might help you more :)

This does not remove fragments, and is not better than the solution the OP is looking to improve upon. — Matthew Story, Jun 29 '18 at 02:52

score 1 · Answer 5 · answered Apr 20 '20 at 17:57

1

You can use w3lib

from w3lib import url as w3_url
url_without_query = w3_url.url_query_cleaner(url)

answered Apr 20 '20 at 17:57

Lücks

3,806
2
40
54

score 0 · Answer 6 · answered Oct 06 '20 at 15:32

Here is an answer using standard libraries, and which parses the URL properly:

from urllib.parse import urlparse

url = 'http://www.example.com/this/category?one=two'
parsed = urlparse(url)
print("".join([parsed.scheme,"://",parsed.netloc,parsed.path]))

expected output:

http://www.example.com/this/category

Note: this also strips params and the fragment, but is easy to modify to include those if you want.

How to remove query string from a url?

6 Answers6

Linked