I'm trying to parse mailto URLs into a nice object or dictionary which includes subject
, body
, etc. I can't seem to find a library or class that achieves this- Do you know of any?
mailto:me@mail.com?subject=mysubject&body=mybody
I'm trying to parse mailto URLs into a nice object or dictionary which includes subject
, body
, etc. I can't seem to find a library or class that achieves this- Do you know of any?
mailto:me@mail.com?subject=mysubject&body=mybody
You can use urlparse and parse_qs to parse urls with mailto as scheme. Be aware though that according to scheme definition:
mailto:me@mail.com,you@mail.com?subject=mysubject
is identical to
mailto:?to=me@mail.com&to=you@mail.com&subject=mysubject
Here's an example:
from urlparse import urlparse, parse_qs
from email.message import Message
url = 'mailto:me@mail.com?subject=mysubject&body=mybody&to=you@mail.com'
msg = Message()
parsed_url = urlparse(url)
header = parse_qs(parsed_url.query)
header['to'] = header.get('to', []) + parsed_url.path.split(',')
for k,v in header.iteritems():
msg[k] = ', '.join(v)
print msg.as_string()
# Will print:
# body: mybody
# to: me@mail.com, you@mail.com
# subject: mysubject
The core urlparse lib does less than a stellar job on mailtos, but gets you halfway there:
In [3]: from urlparse import urlparse
In [4]: urlparse("mailto:me@mail.com?subject=mysubject&body=mybody")
Out[4]: ParseResult(scheme='mailto', netloc='', path='me@mail.com?subject=mysubject&body=mybody', params='', query='', fragment='')
EDIT
A little research unearths this thread. Bottom line: python url parsing sucks.
Seems like you might just want to write your own function to do this.
Edit: Here is a sample function (written by a python noob).
Edit 2, cleanup do to feedback:
from urllib import unquote
test_mailto = 'mailto:me@mail.com?subject=mysubject&body=mybody'
def parse_mailto(mailto):
result = dict()
colon_split = mailto.split(':',1)
quest_split = colon_split[1].split('?',1)
result['email'] = quest_split[0]
for pair in quest_split[1].split('&'):
name = unquote(pair.split('=')[0])
value = unquote(pair.split('=')[1])
result[name] = value
return result
print parse_mailto(test_mailto)
Here is a solution using the re module...
import re
d={}
def parse_mailto(a):
m=re.search('mailto:.+?@.+\\..+?', a)
email=m.group()[7:-1]
m=re.search('@.+?\\..+?\\?subject=.+?&', a)
subject=m.group()[19:-1]
m=re.search('&.+?=.+', a)
body=m.group()[6:]
d['email']=email
d['subject']=subject
d['body']=body
This assumes it is in the same format as you posted. You may need to make modifications to better fit your needs.
You shold use special library like that
https://pypi.python.org/pypi/urlinfo
and contribute and create issue to make Python better ;)
P.S. Does not use Robbert Peters solution bcz it hack and does not work properly. Also using a regular expression is using super BFG Gun to get small bird.
I like Alexander's answer but it is in Python 2! We now get urlparse()
and parse_qs()
from urllib.parse
. Also note that sorting the header in reverse puts it in the order: to, from, body.
from email.message import Message
from pathlib import Path
from urllib.parse import parse_qs, urlparse
url = Path("link.txt").read_text()
msg = Message()
parsed_url = urlparse(url)
header = parse_qs(parsed_url.query)
header["to"] = header.get("to", []) + parsed_url.path.split(",")
for k, v in sorted(header.items(), reverse=True):
print(f"{k}:", v[0])
I am just using this as a one-off, when I used msg.as_string()
I got some strange results though so I just went with the string. The values are lists of one value so I access the 0'th entry to make it a string.
Batteries included: urlparse.
import urllib
query = 'mailto:me@mail.com?subject=mysubject&body=mybody'.partition('?')[2]
print dict((urllib.unquote(s).decode('utf-8') for s in pair.partition('=')[::2])
for pair in query.split('&'))
# -> {u'body': u'mybody', u'subject': u'mysubject'}