7

For developing a mail client I need a very large mbox test file containing as many mails as possible. Preferably >100.000 mails (>10GB).

It should be realistic mail data since I don't only want to test performance but also mail filters and search.

Thanks in advance for any hints where to get stuff like that.

pintpint
  • 321
  • 1
  • 3
  • 5
  • 2
    Set up an open mail server without spam filtering and the address `info@the-domain.com`. Register this address to a few porn sites and wait :-) – Emil Vikström Jun 24 '12 at 13:25
  • Please see [this OpenData page](http://opendata.stackexchange.com/q/4517/1511) for interesting Email resources – philshem Aug 24 '15 at 15:21

3 Answers3

5

You can collect .mbox text files using a search engine. For example, a google search for filetype:mbox pipermail results in plenty of .mbox data.Instead of pipermail, from works as a search string.

Individual .mbox files can be concatenated:

cat mboxfile1 > mboxfile
echo >> mboxfile
cat mboxfile2 >> mboxfile

p.s. It's not the data that's unethical, it's what you do with it. Please act ethically!

philshem
  • 24,761
  • 8
  • 61
  • 127
5

Another couple options:

Enron Email Corpus, with 210 GB of emails. It's multiple email formats, but it should be easy to read.

Enron email data publicly released as part of FERC's Western Energy Markets investigation converted to industry standard formats by EDRM. The data set consists of 1,227,255 emails with 493,384 attachments covering 151 custodians. The email is provided in Microsoft PST, IETF MIME, and EDRM XML formats.

Apache Software Foundation Public Mail Archives (200 GB)

A collection of all publicly available Apache Software Foundation mail archives as of July 11, 2011

This collection contains all publicly available email archives from the ASF's 80+ projects

Amazon link

Community
  • 1
  • 1
philshem
  • 24,761
  • 8
  • 61
  • 127
0

Maybe you can take your own mailbox and replicate it multiple times. E.g. you setup a mail account and copy all emails several times using IMAP, or by using filesystem, but this depends what data format are you using.

Andrew
  • 1,037
  • 9
  • 17