0

I'm trying to remove with Python (Not C#, PHP or others) the %20 symbol from a url after having transformed it into a string. However the symbol keeps staying unchanged no matter what formatting I tried.

here is the code I tried:

url = 'https://www.amazon.com/s?k=hbb%20magic%20dress' # Type string

title_text_data_file = url.split('=')[1]
if '%20'in title_text_data_file:
    title_text_data_file = title_text_data_file.replace('%20+', '')
    keyword = title_text_data_file.replace('+', ' ')
    title_text_data_file = title_text_data_file + ".txt"
    print('Keyword:',keyword,'- File title:',title_text_data_file,'- URL:',url)

Here is what I get:

Keyword: hbb%20magic%20dress - File title: hbb%20magic%20dress.txt - URL: https://www.amazon.com/s?k=hbb%20magic%20dress

Here is what I would like to get:

Keyword: hbb magic dress - File title: hbb+magic+dress.txt - URL: https://www.amazon.com/s?k=hbb%20magic%20dress
Pro Girl
  • 762
  • 7
  • 21
  • `replace('%20+', '')` will replace `'%20+'` with empty string. Isn't just `'%20'` you need to replace? – Austin Jul 12 '19 at 04:22
  • 1
    There's a lot more than `%20` that you need to deal with. See the above link for info on the Python lib that does this, which is `urllib.parse.unquote`. – Tom Karzes Jul 12 '19 at 04:27
  • Hi @TomKarzes, I'm trying to modify the string without importing additional libraries or make the code heavier. – Pro Girl Jul 12 '19 at 04:35
  • 1
    @AmatoIlCiabattaro I would suggest that writing your own code to do this is far "heavier" than importing something that truly solves the problem, rather than handling just 5% of the cases you may encounter. – Tom Karzes Jul 12 '19 at 05:38

3 Answers3

6

Python Urllib.parse module can be used to convert the encoded url.

Example

import urllib.parse
url = 'https://www.amazon.com/s?k=hbb%20magic%20dress' # Type string
urllib.parse.unquote(url) # Returns 'https://www.amazon.com/s?k=hbb magic dress'
urllib.parse.unquote(url).replace(" ","") # Returns 'https://www.amazon.com/s?k=hbbmagicdress'
Sakshi Gupta
  • 153
  • 8
5

Actually, it is better to use libraries designed to deal with urls, as that will handle any urlencoded characters, not just spaces (%20). The standard library provides the urllib.parse module.

In your case you want to use

import urllib.parse
url = 'https://www.amazon.com/s?k=hbb%20magic%20dress'
# This extracts the query part from the url
query = urllib.parse.urlparse(url).query
# This gets the first k parameter, decoding any urlencoded character, not only spaces(%20)
keyword = urllib.parse.parse_qs(query)['k'][0]
Sebastian Kreft
  • 7,819
  • 3
  • 24
  • 41
-3

str.replace(old, new[, max])

you can not replace unexist string.

title_text_data_file = url.split('=')[1]
if '%20'in title_text_data_file:
    key = '%20'
    title_text_data_file = title_text_data_file.replace(key, '+')
    keyword = title_text_data_file.replace('+', ' ')
    title_text_data_file = title_text_data_file + ".txt"
    print('Keyword:',keyword,'- File title:',title_text_data_file,'- URL:',url)
Keyword: hbb magic dress - File title: hbb+magic+dress.txt - URL: https://www.amazon.com/s?k=hbb%20magic%20dress
Terrence Poe
  • 634
  • 4
  • 17