Parse mailto urls in Python

Question

I'm trying to parse mailto URLs into a nice object or dictionary which includes subject, body, etc. I can't seem to find a library or class that achieves this- Do you know of any?

mailto:me@mail.com?subject=mysubject&body=mybody

using the `re` module could be a fast solution – juliomalegria Jan 30 '12 at 17:07 — juliomalegria, Jan 30 '12 at 17:07

score 4 · Answer 1 · answered Jul 30 '15 at 20:56

You can use urlparse and parse_qs to parse urls with mailto as scheme. Be aware though that according to scheme definition:

mailto:me@mail.com,you@mail.com?subject=mysubject

is identical to

mailto:?to=me@mail.com&to=you@mail.com&subject=mysubject

Here's an example:

from urlparse import urlparse, parse_qs
from email.message import Message

url = 'mailto:me@mail.com?subject=mysubject&body=mybody&to=you@mail.com'
msg = Message()
parsed_url = urlparse(url)

header = parse_qs(parsed_url.query)
header['to'] = header.get('to', []) + parsed_url.path.split(',')

for k,v in header.iteritems():
    msg[k] = ', '.join(v)

print msg.as_string()

# Will print:
# body: mybody
# to: me@mail.com, you@mail.com
# subject: mysubject

score 2 · Answer 2 · edited May 23 '17 at 11:51

2

The core urlparse lib does less than a stellar job on mailtos, but gets you halfway there:

In [3]: from urlparse import urlparse

In [4]: urlparse("mailto:me@mail.com?subject=mysubject&body=mybody")
Out[4]: ParseResult(scheme='mailto', netloc='', path='me@mail.com?subject=mysubject&body=mybody', params='', query='', fragment='')

EDIT

A little research unearths this thread. Bottom line: python url parsing sucks.

edited May 23 '17 at 11:51

Community

1
1

answered Jan 30 '12 at 17:11

Alien Life Form

1,884
1
19
27

Why it does not catch the query part beats me, tho' – Alien Life Form Jan 30 '12 at 17:12
Tried this- doesn't seem to do anything but grab the scheme – Yarin Jan 30 '12 at 17:16
It should also url-decode the chunks. No big feat, but still. – Alien Life Form Jan 30 '12 at 17:23
`urlparse()` returns correct result see [rfc3986](http://tools.ietf.org/html/rfc3986) – jfs Jan 30 '12 at 18:19
Going to an RFC for the subject matter is really strange, but it so happens that urlparse is not even correct with regard to the RFC or its own documentation, since it purports to separate the query part - but does not. "URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]" – Alien Life Form Jan 31 '12 at 12:14
url with # wont be parsed. ex `domain.com/value#additional_data` behind # will be muted – softmarshmallow Nov 30 '18 at 14:34

Robert Peters · Accepted Answer · 2012-01-30T20:28:19.823

1

Seems like you might just want to write your own function to do this.

Edit: Here is a sample function (written by a python noob).

Edit 2, cleanup do to feedback:

from urllib import unquote
test_mailto = 'mailto:me@mail.com?subject=mysubject&body=mybody'

def parse_mailto(mailto):
   result = dict()
   colon_split = mailto.split(':',1)
   quest_split = colon_split[1].split('?',1)
   result['email'] = quest_split[0]

   for pair in quest_split[1].split('&'):
      name = unquote(pair.split('=')[0])
      value = unquote(pair.split('=')[1])
      result[name] = value

   return result

print parse_mailto(test_mailto)

edited Jan 30 '12 at 20:28

answered Jan 30 '12 at 17:05

Robert Peters

3,814
1
17
9

1

You should probably use `.split(sep, 1)` to limit to one split, and save the results instead of splitting multiple times. Plus, you will need `urllib.unquote()` to decode `%xx` placeholders in the query string keys and variables. – Ferdinand Beyer Jan 30 '12 at 18:26

score 1 · Answer 4 · answered Jan 30 '12 at 17:34

Here is a solution using the re module...

import re

d={}
def parse_mailto(a):
  m=re.search('mailto:.+?@.+\\..+?', a)
  email=m.group()[7:-1]
  m=re.search('@.+?\\..+?\\?subject=.+?&', a)
  subject=m.group()[19:-1]
  m=re.search('&.+?=.+', a)
  body=m.group()[6:]

  d['email']=email
  d['subject']=subject
  d['body']=body

This assumes it is in the same format as you posted. You may need to make modifications to better fit your needs.

score 0 · Answer 5 · answered May 14 '15 at 20:17

You shold use special library like that

https://pypi.python.org/pypi/urlinfo

and contribute and create issue to make Python better ;)

P.S. Does not use Robbert Peters solution bcz it hack and does not work properly. Also using a regular expression is using super BFG Gun to get small bird.

score 0 · Answer 6 · answered Feb 23 '23 at 18:13

I like Alexander's answer but it is in Python 2! We now get urlparse() and parse_qs() from urllib.parse. Also note that sorting the header in reverse puts it in the order: to, from, body.

from email.message import Message
from pathlib import Path
from urllib.parse import parse_qs, urlparse

url = Path("link.txt").read_text()
msg = Message()
parsed_url = urlparse(url)
header = parse_qs(parsed_url.query)
header["to"] = header.get("to", []) + parsed_url.path.split(",")

for k, v in sorted(header.items(), reverse=True):
    print(f"{k}:", v[0])

I am just using this as a one-off, when I used msg.as_string() I got some strange results though so I just went with the string. The values are lists of one value so I access the 0'th entry to make it a string.

score 0 · Answer 7 · answered Jan 30 '12 at 17:10

0

Batteries included: urlparse.

answered Jan 30 '12 at 17:10

Ferdinand Beyer

64,979
15
154
145

Doesn't work- urlparse result = `ParseResult(scheme='mailto', netloc='', path='me@mail.com?subject=mysubject&body=mybody', params='', query='', fragment='')` - Does not read subject/body/etc – Yarin Jan 30 '12 at 17:12

score 0 · Answer 8 · answered Jan 30 '12 at 17:32

0

import urllib

query = 'mailto:me@mail.com?subject=mysubject&body=mybody'.partition('?')[2]
print dict((urllib.unquote(s).decode('utf-8') for s in pair.partition('=')[::2])
           for pair in query.split('&'))
# -> {u'body': u'mybody', u'subject': u'mysubject'}

answered Jan 30 '12 at 17:32

jfs

399,953
195
994
1,670

Thanks bladerunner, this works too- Gave it to Robert because he was first – Yarin Jan 30 '12 at 17:37

Parse mailto urls in Python

8 Answers8

Linked