How to remove the '%20' from a url in Python?

Question

I'm trying to remove with Python (Not C#, PHP or others) the %20 symbol from a url after having transformed it into a string. However the symbol keeps staying unchanged no matter what formatting I tried.

here is the code I tried:

url = 'https://www.amazon.com/s?k=hbb%20magic%20dress' # Type string

title_text_data_file = url.split('=')[1]
if '%20'in title_text_data_file:
    title_text_data_file = title_text_data_file.replace('%20+', '')
    keyword = title_text_data_file.replace('+', ' ')
    title_text_data_file = title_text_data_file + ".txt"
    print('Keyword:',keyword,'- File title:',title_text_data_file,'- URL:',url)

Here is what I get:

Keyword: hbb%20magic%20dress - File title: hbb%20magic%20dress.txt - URL: https://www.amazon.com/s?k=hbb%20magic%20dress

Here is what I would like to get:

Keyword: hbb magic dress - File title: hbb+magic+dress.txt - URL: https://www.amazon.com/s?k=hbb%20magic%20dress

`replace('%20+', '')` will replace `'%20+'` with empty string. Isn't just `'%20'` you need to replace? — Austin, Jul 12 '19 at 04:22
There's a lot more than `%20` that you need to deal with. See the above link for info on the Python lib that does this, which is `urllib.parse.unquote`. — Tom Karzes, Jul 12 '19 at 04:27
Hi @TomKarzes, I'm trying to modify the string without importing additional libraries or make the code heavier. — Pro Girl, Jul 12 '19 at 04:35
@AmatoIlCiabattaro I would suggest that writing your own code to do this is far "heavier" than importing something that truly solves the problem, rather than handling just 5% of the cases you may encounter. — Tom Karzes, Jul 12 '19 at 05:38

score 6 · Answer 1 · answered Jul 12 '19 at 04:35

Python Urllib.parse module can be used to convert the encoded url.

Example

import urllib.parse
url = 'https://www.amazon.com/s?k=hbb%20magic%20dress' # Type string
urllib.parse.unquote(url) # Returns 'https://www.amazon.com/s?k=hbb magic dress'
urllib.parse.unquote(url).replace(" ","") # Returns 'https://www.amazon.com/s?k=hbbmagicdress'

score 5 · Answer 2 · answered Jul 12 '19 at 04:34

Actually, it is better to use libraries designed to deal with urls, as that will handle any urlencoded characters, not just spaces (%20). The standard library provides the urllib.parse module.

In your case you want to use

import urllib.parse
url = 'https://www.amazon.com/s?k=hbb%20magic%20dress'
# This extracts the query part from the url
query = urllib.parse.urlparse(url).query
# This gets the first k parameter, decoding any urlencoded character, not only spaces(%20)
keyword = urllib.parse.parse_qs(query)['k'][0]

score -3 · Accepted Answer · answered Jul 12 '19 at 04:26

str.replace(old, new[, max])

you can not replace unexist string.

title_text_data_file = url.split('=')[1]
if '%20'in title_text_data_file:
    key = '%20'
    title_text_data_file = title_text_data_file.replace(key, '+')
    keyword = title_text_data_file.replace('+', ' ')
    title_text_data_file = title_text_data_file + ".txt"
    print('Keyword:',keyword,'- File title:',title_text_data_file,'- URL:',url)

Keyword: hbb magic dress - File title: hbb+magic+dress.txt - URL: https://www.amazon.com/s?k=hbb%20magic%20dress

How to remove the '%20' from a url in Python?

3 Answers3