Best way to parse a URL query string

Question

What is the best way to parse data out of a URL query string (for instance, data appended to the URL by a form) in python? My goal is to accept form data and display it on the same page. I've researched several methods that aren't quite what I'm looking for.

I'm creating a simple web server with the goal of learning about sockets. This web server won't be used for anything but testing purposes.

GET /?1pm=sample&2pm=&3pm=&4pm=&5pm= HTTP/1.1
Host: localhost:50000
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20100101 Firefox/11.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer: http://localhost:50000/?1pm=sample&2pm=&3pm=&4pm=&5pm=

What's wrong with http://stackoverflow.com/questions/1349367/parse-an-http-request-authorization-header-with-python or http://stackoverflow.com/questions/4685217/parse-raw-http-headers. You haven't given us enough info about what other approaches are lacking. Do you have an example header or two? — Steven Rumbalski, Apr 11 '12 at 20:12
Nothing is 'wrong' with either of these posts. Based on the programming experiences I've head in the past, I'm inclined to do something similar like a regex expression in the second link. However, I wanted to ask and see if there is a simpler way to do it since this is my first python program. — egoskeptical, Apr 11 '12 at 20:24
Looks to me like you're talking about URL query strings, not HTTP headers. You might want to update your question to reflect this. — ʇsәɹoɈ, Apr 11 '12 at 20:57

jmunsch · Answer 1 · 2018-11-27T15:24:24.273

108

Here is an example using python3 urllib.parse:

from urllib.parse import urlparse, parse_qs
URL='https://someurl.com/with/query_string?i=main&mode=front&sid=12ab&enc=+Hello'
parsed_url = urlparse(URL)
parse_qs(parsed_url.query)

output:

{'i': ['main'], 'enc': [' Hello '], 'mode': ['front'], 'sid': ['12ab']}

Note for python2: from urlparse import urlparse, parse_qs

SEE: https://pythonhosted.org/six/#module-six.moves.urllib.parse

edited Nov 27 '18 at 15:24

answered Oct 03 '16 at 23:24

jmunsch

22,771
11
93
114

3

And why are the values like this ```['value']``` ? ```dic['enc']``` gets ```['Hello']``` - how to get 'Hello'? with split? – Suisse Jul 17 '17 at 01:36
3

@Suisse see https://stackoverflow.com/questions/11447391/ajax-why-jquery-replaces-with-a-space the values are in a list because multiple values can be encoded see : https://stackoverflow.com/questions/2571145/urlencode-an-array-of-values hope it helps – jmunsch Jul 18 '17 at 20:47

score 54 · Answer 2 · edited Aug 19 '16 at 20:31

54

The urllib.parse module is your friend: https://docs.python.org/3/library/urllib.parse.html

Check out urllib.parse.parse_qs (parsing a query-string, i.e. form data sent to server by GET or form data posted by POST, at least for non-multipart data). There's also cgi.FieldStorage for interpreting multipart-data.

For parsing the rest of an HTTP interaction, see RFC2616, which is the HTTP/1.1 protocol specification.

edited Aug 19 '16 at 20:31

Delgan

18,571
11
90
141

answered Apr 11 '12 at 20:11

modelnine

1,499
8
11

3

I'm not writing the script for him. He specifically asked how to parse query data, at least that's what I read between the lines, even though those are not actually HTTP headers. But I didn't bother commenting on that. – modelnine Apr 11 '12 at 20:14
I'm not suggesting that you should write the script for him, but urlparse is only a tiny piece of this puzzle. – Marcin Apr 11 '12 at 20:19
4

For the amount of information he gave, that's all there is to say. Specifically, if you're actually referring to HTTP headers: is he using a webserver which actually allows you to get HTTP headers uninterpreted (via some stream)? Is he using WSGI (where HTTP-headers are interpreted by the framework)? Plain-old CGI, where you have to interpret the environment and hope for the best? Whatever. – modelnine Apr 11 '12 at 20:22
urlparse looks like a great resource. The header is pretty simple and I've added it to the original question. As I'm sure you can guess, my initial idea is to parse the get line into an array of strings. – egoskeptical Apr 11 '12 at 20:26
Are you trying to write a webserver? Or some form of packet inspection/inspector? – modelnine Apr 11 '12 at 20:31
As posted this is a simple web server that serves a web page consisting of a form. When the user clicks submit, the form inputs are appended to the URL. My goal is to parse the appended url, retrieve what was entered into the form, and display it on the page. – egoskeptical Apr 11 '12 at 20:36
Why not use a "proper" webserver to host your application? There's no need to reinvent the wheel (i.e., implement your own application server, which handles parsing the incoming request). Have you had a look at CherryPy or anything similar? I'm trying to discourage you, even as a pet/hobby project, to try to write anything resembling a web-server, HTTP/1.0+ are a PITA to implement correctly. – modelnine Apr 11 '12 at 21:26
I'm only interested in writing my own, and finding the best way to parse a URL query string. – egoskeptical Apr 11 '12 at 21:56
If it's just the URL-query-string to parse, check out the modules I referenced in the answer. If you need to parse the full HTTP client interaction/request, you're in for some reading of RFC2616 (http://www.w3.org/Protocols/rfc2616/rfc2616.html) which describes the HTTP protocol. There's nothing "premade" for this kind of parsing in the Python stdlib. – modelnine Apr 11 '12 at 22:11
For Python 2, you're looking for `urlparse.parse_qs`. – freethebees Apr 05 '17 at 08:36

score 31 · Answer 3 · answered Oct 06 '17 at 08:05

31

If you need unique key from query string, use dict() with parse_qsl()

import urllib.parse
urllib.parse.urlparse('https://someurl.com/with/query_string?a=1&b=2&b=3').query
    a=1&b=2&b=3
urllib.parse.parse_qs('a=1&b=2&b=3');
    {'a': ['1'], 'b': ['2','3']}
urllib.parse.parse_qsl('a=1&b=2&b=3')
    [('a', '1'), ('b', '2'), ('b', '3')]
dict(urllib.parse.parse_qsl('a=1&b=2&b=3'))
    {'a': '1', 'b': '3'}

answered Oct 06 '17 at 08:05

ahuigo

2,929
2
25
45

It's important to notice that the casting from tuple to dict result don't consider `b` to have two values, one which gets ignored. Wasn't aware of [`parse_qsl`](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.parse_qsl), good addition. – Kristoffer Bakkejord Sep 04 '20 at 19:07

score 8 · Answer 4 · answered Jun 18 '19 at 20:38

8

built into python 2.7

>>> from urlparse import parse_qs
>>> parse_qs("search=quint&tags=python")
{'search': ['quint'], 'tags': ['python']}

answered Jun 18 '19 at 20:38

Cuyler Quint

186
2
6

score 2 · Answer 5 · answered May 07 '21 at 11:26

2

only for one line quick prototyping CGI vars without imports, not the best obviously but could be useful.

agrs = dict(item.split('=') for item in env['QUERY_STRING'].split('&') if item)

answered May 07 '21 at 11:26

ollofx

59
5

4

This will break if any parameter in the query string is URL-encoded. "Manual parsing" of URLs is the source of many security issues. – Daniel Serodio Oct 13 '21 at 19:24
2

indeed why the warning "only for prototyping" posted it to show case a quick parsing without any import – ollofx Oct 19 '21 at 12:40
1

I wonder if every URL parser is a "manual parser"? At some point someone had to sit down and write it... – étale-cohomology Feb 15 '22 at 10:26

Best way to parse a URL query string

5 Answers5

Linked

Related