1

Suppose I have a URL as follows:

http://sitename.com/pathname?title=moviename&url=VIDEO_URL

I want to parse this URL to get the title part and url part alone separately.

I tried the following,

from urlparse import urlparse
q = urlparse('http://sitename.com/pathname?title=moviename&url=VIDEO_URL')

After I do this, I get the following result,

q
ParseResult(scheme='http', netloc='sitename.com', path='/pathname', params='', query='title=moviename&url=VIDEO_URL', fragment='')

and q.query has,

'title=moviename&url=VIDEO_URL'

I am not able to use q.query.title or q.query.url here. Is there a way I can access this? I would like to split the url and title part separately into separate columns. Can we do it this way or can we write a substring method which would check for starting with "title" and ending with "&" and split it?

Thanks

haimen
  • 1,985
  • 7
  • 30
  • 53

5 Answers5

7

You can use urlparse.parse_qs here to make a dictionary of parameters.

from urlparse import urlparse, parse_qs
q = urlparse('http://sitename.com/pathname?title=moviename&url=VIDEO_URL')
qs = parse_qs(q.query)
print qs["title"] # moviename
print qs["url"] # VIDEO_URL

This is the most reliable way to parse a URL's parameters: much better than split.

Aaron Christiansen
  • 11,584
  • 5
  • 52
  • 78
1

urlparse can parse the url, from there get query and parse that:

>>> import urlparse
>>> url = 'http://sitename.com/pathname?title=moviename&url=VIDEO_URL'
>>> urlparse.parse_qs(urlparse.urlparse(url).query)
{'title': ['moviename'], 'url': ['VIDEO_URL']}

As the query string parameter can appear multiple times, the dictionary provides list of found values (even when there is only one value found.)

Jan Vlcinsky
  • 42,725
  • 12
  • 101
  • 98
0

You're doing it right, it's just that a standard URL is made of:

<SCHEME>://<NETLOC>/<PATH>?<QUERY>

so what you want to extract the details from the query is to split the string, like that, if you like the dirty way:

>>> data = dict(item.split('=') for item in q.query.split('&'))
>>> data
{'url': 'VIDEO_URL', 'title': 'moviename'}
>>> print(data['url'])

and there you have your URL! This a a very basic and canonical version of what the urlparse library offers through the parse_qsl() method. That method also converts + into spaces, handles ';' as well as & and unquotes the URL.

So to use urlparse's parse_qsl function, all you have to do is:

>> data =urlparse.parse_qsl(q.query)
{'url': 'VIDEO_URL', 'title': 'moviename'}
>>> print(data['url'])

N.B.: it's NOT safer to use parse_qsl than the split() method, but more RELIABLE. The main difference is that parse_qsl will work with all possible use cases of queries as defined by the RFC, whereas the split() method works with a single case.

zmo
  • 24,463
  • 4
  • 54
  • 90
0

These answers are spot on for parsing the query string. To go a step further and also use dot notation, also see Convert Python dict to object?

from collections import namedtuple
QS = namedtuple('QS', qs.keys())
dotted_qs = QS(**qs)
dotted_qs.url #['moviename']

Note that the dict that comes back from parse_qs can be multi-valued, hence the list return type of dotted.url. You can collapse it to single value with a dict comprehension or parse_qsl:

qs = {k: v[0] for k, v in q.query.items()} 

Or...

qs = dict(urlparse.parse_qsl(q.query.items()))

Hope that helps.

Community
  • 1
  • 1
bimsapi
  • 4,985
  • 2
  • 19
  • 27
-1

To get just the query parameters split by the '&' you can use:

q.query.split('&')

Or to get pairs of parameter/value you can use:

args = [tuple(arg.split('=')) for arg in q.query.split('&')]

avip
  • 1,445
  • 13
  • 14