5

I am trying to extract links from a webpage and then open them in my web browser. My Python program is able to successfully extract the links, but some links have spaces between them which cannot be open using request module.

For example example.com/A, B C it will not open using the request module. But if I convert it into example.com/A,%20B%20C it will open. Is there a simple way in python to fill the spaces with %20 ?

`http://example.com/A, B C` ---> `http://example.com/A,%20B%20C`

I want to convert all links which have spaces between them into the above format.

python
  • 4,403
  • 13
  • 56
  • 103

3 Answers3

5

urlencode actually takes a dictionary, for example:

>>> urllib.urlencode({'test':'param'})
'test=param'`

You actually need something like this:

import urllib
import urlparse

def url_fix(s, charset='utf-8'):
    if isinstance(s, unicode):
        s = s.encode(charset, 'ignore')
    scheme, netloc, path, qs, anchor = urlparse.urlsplit(s)
    path = urllib.quote(path, '/%')
    qs = urllib.quote_plus(qs, ':&=')
    return urlparse.urlunsplit((scheme, netloc, path, qs, anchor))

Then:

>>>url_fix('http://example.com/A, B C')    
'http://example.com/A%2C%20B%20C'

Taken from How can I normalize a URL in python

Community
  • 1
  • 1
rofls
  • 4,993
  • 3
  • 27
  • 37
  • I am using something like this `'%20'.join(line.split())`. Do you think it is a good decision ? – python Oct 10 '15 at 02:59
  • Well that will handle the spaces :) But I would recommend using the above snippet, since it appears to work rather well. The commas are also not supported in http urls, as well as a list of other characters. This appears to handle them correctly. – rofls Oct 10 '15 at 03:02
  • Yes, I agree with you. Thank you so much :) – python Oct 10 '15 at 03:03
  • This won't work in python3 as urlparse & urllib are combined into one and encoding to Unicode is not required as every string is considered as unicode in python 3. Check my last post on this page. – SANDEEP MACHIRAJU Oct 09 '20 at 00:22
1

use url encode:

import urllib
urllib.urlencode(yourstring)
ergonaut
  • 6,929
  • 1
  • 17
  • 47
0

Python 3 working solution for @rofls answer.

import urllib.parse as urlparse
def url_fix(s):
    scheme, netloc, path, qs, anchor = urlparse.urlsplit(s)
    path = urlparse.quote(path, '/%')
    qs = urlparse.quote_plus(qs, ':&=')
    return urlparse.urlunsplit((scheme, netloc, path, qs, anchor))
SANDEEP MACHIRAJU
  • 817
  • 10
  • 17