93

What is the best way to parse data out of a URL query string (for instance, data appended to the URL by a form) in python? My goal is to accept form data and display it on the same page. I've researched several methods that aren't quite what I'm looking for.

I'm creating a simple web server with the goal of learning about sockets. This web server won't be used for anything but testing purposes.

GET /?1pm=sample&2pm=&3pm=&4pm=&5pm= HTTP/1.1
Host: localhost:50000
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20100101 Firefox/11.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer: http://localhost:50000/?1pm=sample&2pm=&3pm=&4pm=&5pm=
egoskeptical
  • 963
  • 1
  • 6
  • 8
  • Are you looking to write the parsing from scratch, or what? – Marcin Apr 11 '12 at 20:11
  • 2
    What's wrong with http://stackoverflow.com/questions/1349367/parse-an-http-request-authorization-header-with-python or http://stackoverflow.com/questions/4685217/parse-raw-http-headers. You haven't given us enough info about what other approaches are lacking. Do you have an example header or two? – Steven Rumbalski Apr 11 '12 at 20:12
  • Nothing is 'wrong' with either of these posts. Based on the programming experiences I've head in the past, I'm inclined to do something similar like a regex expression in the second link. However, I wanted to ask and see if there is a simpler way to do it since this is my first python program. – egoskeptical Apr 11 '12 at 20:24
  • Looks to me like you're talking about URL query strings, not HTTP headers. You might want to update your question to reflect this. – ʇsәɹoɈ Apr 11 '12 at 20:57

5 Answers5

108

Here is an example using python3 urllib.parse:

from urllib.parse import urlparse, parse_qs
URL='https://someurl.com/with/query_string?i=main&mode=front&sid=12ab&enc=+Hello'
parsed_url = urlparse(URL)
parse_qs(parsed_url.query)

output:

{'i': ['main'], 'enc': [' Hello '], 'mode': ['front'], 'sid': ['12ab']}

Note for python2: from urlparse import urlparse, parse_qs

SEE: https://pythonhosted.org/six/#module-six.moves.urllib.parse

jmunsch
  • 22,771
  • 11
  • 93
  • 114
  • 3
    And why are the values like this ```['value']``` ? ```dic['enc']``` gets ```['Hello']``` - how to get 'Hello'? with split? – Suisse Jul 17 '17 at 01:36
  • 3
    @Suisse see https://stackoverflow.com/questions/11447391/ajax-why-jquery-replaces-with-a-space the values are in a list because multiple values can be encoded see : https://stackoverflow.com/questions/2571145/urlencode-an-array-of-values hope it helps – jmunsch Jul 18 '17 at 20:47
54

The urllib.parse module is your friend: https://docs.python.org/3/library/urllib.parse.html

Check out urllib.parse.parse_qs (parsing a query-string, i.e. form data sent to server by GET or form data posted by POST, at least for non-multipart data). There's also cgi.FieldStorage for interpreting multipart-data.

For parsing the rest of an HTTP interaction, see RFC2616, which is the HTTP/1.1 protocol specification.

Delgan
  • 18,571
  • 11
  • 90
  • 141
modelnine
  • 1,499
  • 8
  • 11
  • 3
    I'm not writing the script for him. He specifically asked how to parse query data, at least that's what I read between the lines, even though those are not actually HTTP headers. But I didn't bother commenting on that. – modelnine Apr 11 '12 at 20:14
  • I'm not suggesting that you should write the script for him, but urlparse is only a tiny piece of this puzzle. – Marcin Apr 11 '12 at 20:19
  • 4
    For the amount of information he gave, that's all there is to say. Specifically, if you're actually referring to HTTP headers: is he using a webserver which actually allows you to get HTTP headers uninterpreted (via some stream)? Is he using WSGI (where HTTP-headers are interpreted by the framework)? Plain-old CGI, where you have to interpret the environment and hope for the best? Whatever. – modelnine Apr 11 '12 at 20:22
  • urlparse looks like a great resource. The header is pretty simple and I've added it to the original question. As I'm sure you can guess, my initial idea is to parse the get line into an array of strings. – egoskeptical Apr 11 '12 at 20:26
  • Are you trying to write a webserver? Or some form of packet inspection/inspector? – modelnine Apr 11 '12 at 20:31
  • As posted this is a simple web server that serves a web page consisting of a form. When the user clicks submit, the form inputs are appended to the URL. My goal is to parse the appended url, retrieve what was entered into the form, and display it on the page. – egoskeptical Apr 11 '12 at 20:36
  • Why not use a "proper" webserver to host your application? There's no need to reinvent the wheel (i.e., implement your own application server, which handles parsing the incoming request). Have you had a look at CherryPy or anything similar? I'm trying to discourage you, even as a pet/hobby project, to try to write anything resembling a web-server, HTTP/1.0+ are a PITA to implement correctly. – modelnine Apr 11 '12 at 21:26
  • I'm only interested in writing my own, and finding the best way to parse a URL query string. – egoskeptical Apr 11 '12 at 21:56
  • If it's just the URL-query-string to parse, check out the modules I referenced in the answer. If you need to parse the full HTTP client interaction/request, you're in for some reading of RFC2616 (http://www.w3.org/Protocols/rfc2616/rfc2616.html) which describes the HTTP protocol. There's nothing "premade" for this kind of parsing in the Python stdlib. – modelnine Apr 11 '12 at 22:11
  • For Python 2, you're looking for `urlparse.parse_qs`. – freethebees Apr 05 '17 at 08:36
31

If you need unique key from query string, use dict() with parse_qsl()

import urllib.parse
urllib.parse.urlparse('https://someurl.com/with/query_string?a=1&b=2&b=3').query
    a=1&b=2&b=3
urllib.parse.parse_qs('a=1&b=2&b=3');
    {'a': ['1'], 'b': ['2','3']}
urllib.parse.parse_qsl('a=1&b=2&b=3')
    [('a', '1'), ('b', '2'), ('b', '3')]
dict(urllib.parse.parse_qsl('a=1&b=2&b=3'))
    {'a': '1', 'b': '3'}
ahuigo
  • 2,929
  • 2
  • 25
  • 45
  • It's important to notice that the casting from tuple to dict result don't consider `b` to have two values, one which gets ignored. Wasn't aware of [`parse_qsl`](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.parse_qsl), good addition. – Kristoffer Bakkejord Sep 04 '20 at 19:07
8

built into python 2.7

>>> from urlparse import parse_qs
>>> parse_qs("search=quint&tags=python")
{'search': ['quint'], 'tags': ['python']}
Cuyler Quint
  • 186
  • 2
  • 6
2

only for one line quick prototyping CGI vars without imports, not the best obviously but could be useful.

agrs = dict(item.split('=') for item in env['QUERY_STRING'].split('&') if item)
ollofx
  • 59
  • 5