3

I'm trying to parse mailto URLs into a nice object or dictionary which includes subject, body, etc. I can't seem to find a library or class that achieves this- Do you know of any?

mailto:me@mail.com?subject=mysubject&body=mybody
Yarin
  • 173,523
  • 149
  • 402
  • 512

8 Answers8

4

You can use urlparse and parse_qs to parse urls with mailto as scheme. Be aware though that according to scheme definition:

mailto:me@mail.com,you@mail.com?subject=mysubject

is identical to

mailto:?to=me@mail.com&to=you@mail.com&subject=mysubject

Here's an example:

from urlparse import urlparse, parse_qs
from email.message import Message

url = 'mailto:me@mail.com?subject=mysubject&body=mybody&to=you@mail.com'
msg = Message()
parsed_url = urlparse(url)

header = parse_qs(parsed_url.query)
header['to'] = header.get('to', []) + parsed_url.path.split(',')

for k,v in header.iteritems():
    msg[k] = ', '.join(v)

print msg.as_string()

# Will print:
# body: mybody
# to: me@mail.com, you@mail.com
# subject: mysubject
2

The core urlparse lib does less than a stellar job on mailtos, but gets you halfway there:

In [3]: from urlparse import urlparse

In [4]: urlparse("mailto:me@mail.com?subject=mysubject&body=mybody")
Out[4]: ParseResult(scheme='mailto', netloc='', path='me@mail.com?subject=mysubject&body=mybody', params='', query='', fragment='')

EDIT

A little research unearths this thread. Bottom line: python url parsing sucks.

Community
  • 1
  • 1
Alien Life Form
  • 1,884
  • 1
  • 19
  • 27
1

Seems like you might just want to write your own function to do this.

Edit: Here is a sample function (written by a python noob).

Edit 2, cleanup do to feedback:

from urllib import unquote
test_mailto = 'mailto:me@mail.com?subject=mysubject&body=mybody'

def parse_mailto(mailto):
   result = dict()
   colon_split = mailto.split(':',1)
   quest_split = colon_split[1].split('?',1)
   result['email'] = quest_split[0]

   for pair in quest_split[1].split('&'):
      name = unquote(pair.split('=')[0])
      value = unquote(pair.split('=')[1])
      result[name] = value

   return result

print parse_mailto(test_mailto)
Robert Peters
  • 3,814
  • 1
  • 17
  • 9
  • 1
    You should probably use `.split(sep, 1)` to limit to one split, and save the results instead of splitting multiple times. Plus, you will need `urllib.unquote()` to decode `%xx` placeholders in the query string keys and variables. – Ferdinand Beyer Jan 30 '12 at 18:26
1

Here is a solution using the re module...

import re

d={}
def parse_mailto(a):
  m=re.search('mailto:.+?@.+\\..+?', a)
  email=m.group()[7:-1]
  m=re.search('@.+?\\..+?\\?subject=.+?&', a)
  subject=m.group()[19:-1]
  m=re.search('&.+?=.+', a)
  body=m.group()[6:]

  d['email']=email
  d['subject']=subject
  d['body']=body

This assumes it is in the same format as you posted. You may need to make modifications to better fit your needs.

CoffeeRain
  • 4,460
  • 4
  • 31
  • 50
0

You shold use special library like that

https://pypi.python.org/pypi/urlinfo

and contribute and create issue to make Python better ;)

P.S. Does not use Robbert Peters solution bcz it hack and does not work properly. Also using a regular expression is using super BFG Gun to get small bird.

Vitold S.
  • 402
  • 4
  • 13
0

I like Alexander's answer but it is in Python 2! We now get urlparse() and parse_qs() from urllib.parse. Also note that sorting the header in reverse puts it in the order: to, from, body.

from email.message import Message
from pathlib import Path
from urllib.parse import parse_qs, urlparse

url = Path("link.txt").read_text()
msg = Message()
parsed_url = urlparse(url)
header = parse_qs(parsed_url.query)
header["to"] = header.get("to", []) + parsed_url.path.split(",")

for k, v in sorted(header.items(), reverse=True):
    print(f"{k}:", v[0])

I am just using this as a one-off, when I used msg.as_string() I got some strange results though so I just went with the string. The values are lists of one value so I access the 0'th entry to make it a string.

Louis Maddox
  • 5,226
  • 5
  • 36
  • 66
0

Batteries included: urlparse.

Ferdinand Beyer
  • 64,979
  • 15
  • 154
  • 145
  • Doesn't work- urlparse result = `ParseResult(scheme='mailto', netloc='', path='me@mail.com?subject=mysubject&body=mybody', params='', query='', fragment='')` - Does not read subject/body/etc – Yarin Jan 30 '12 at 17:12
0
import urllib

query = 'mailto:me@mail.com?subject=mysubject&body=mybody'.partition('?')[2]
print dict((urllib.unquote(s).decode('utf-8') for s in pair.partition('=')[::2])
           for pair in query.split('&'))
# -> {u'body': u'mybody', u'subject': u'mysubject'}
jfs
  • 399,953
  • 195
  • 994
  • 1,670