1

Environment: MS Office LTSC Pro Plus 2021 under Windows 11 Pro 64

Background

A couple of weeks ago, I had the startling experience of watching my Outlook inbox fill up with thousands of copies of old e-mails stored in various Outlook folders. A Google search informed me that this is not a new problem. It's apparently caused by a bug in Outlook that has apparently never been fixed, and I found nothing about what might cause the bug to be triggered and how to prevent that. It's a big problem because I use my inbox to keep a backlog of e-mails requiring later attention, and now those e-mails are drowning in a sea of random old e-mails. So I need to find a way to reliably identify and remove those copies.

To do this, I've been learning Outlook VBA (e.g.). I'm experienced in VBA for Excel and Access, but new to it in Outlook. What I've discovered so far is that the property CreatedTime is set to the time those copies were made. This has allowed me to determine that my inbox has about 8,000 of those errant copies, made in seven spurts on May 3, with each copied e-mail appearing between about two and seven times in the inbox. I have no idea why it happened those seven times on that day and why it hasn't happened again since then, and I live in fear of it suddenly happening again.

In a filesystem, one can run a comparison of two files to determine if they are identical. As far as I know, there is no such comparison facility that can be run between two e-mails. One has to pick a set of properties to compare and hope for the best. The system I've come up with looks for e-mails with the same values of the MailItem properties SenderName, To, and Subject and the same value of "timestamp", which I define as property SentOn if SenderEmailAddress is one of mine, and otherwise ReceivedTime. I suppose it would be more accurate to compare bodies, but I'm doing this by exporting properties to Excel, where the comparisons are run, and the e-mail bodies are too large to do that with. If I were more proficient in Outlook VBA, I could perhaps write a routine to do the comparisons there, but I haven't figured out how to do that. I thought of including Size as a proxy for body content, but I discovered that size can be mysteriously different for two e-mails that appear to be otherwise identical. More about that below.

That is the background of the question I ask below. In addition to an answer to that question, I'd also be grateful if someone can direct me to any technical information about the bug that caused those copies, what triggers it, and how to prevent it being triggered again in the future.

Hidden data

In my inspection of the data of those copies, I've made two (Edit: three) strange observations:

  1. In order to keep down the size of my pst file, when I archive an e-mail in Outlook, I remove any sizable attachments for storage elsewhere. So I was flabbergasted to find that when Outlook generated these errant copies of old e-mails, many of them include attachments that I removed. This means that removing attachments from an e-mail does not reduce the size of the pst file at all, but merely hides the attachments!
  2. There is a case of a file whose size is 21 kb. As far as I can tell, it never had an attachment. There are seven copies of it in the inbox with CreationTime at seven different times on May 3. Six of those copies are 21 kb, but one is 136 kb. I've opened the original and the large copy, and I see no difference in the content. This means that there are 115 kb of data hiding somewhere in the data structure of that larger copy. If these were files, I would open them in Notepad++ to see if I could find where the differences are. But I don't know how to open the full content of an e-mail like that. I ran a routine to load one of the e-mails in VBA by its EntryID and then added it to the watch window to look at its structure. That 115 kb has to be hiding somewhere in there, but I couldn't tell from this where it is.
  3. Edit: It's worse than that. Another thing I've been doing to try to keep down the size of the pst is that when an e-mail is large because of embedded images, I forward the e-mail to myself with the images deleted, and then permanently delete (or so I thought) the original e-mail. By "permanently delete," I mean that I first move the e-mail to a subfolder of "Deleted Items" called "Too big." Then, every so often, I delete this folder. When I delete a folder elsewhere, it gets moved to the "Deleted Items" folder. But when I delete a subfolder of "Deleted Items", I get a message, "Delete this folder and everything in it?" and when I click "Yes", the folder and its contents disappear. But guess what? Included in the errant copies in my inbox are copies of old e-mails that I thought I had removed from Outlook in this way. This means that when I delete a subfolder of "Deleted Items", Outlook does not discard its contents, but hides them somewhere.

My question

Both the hidden attachments in the first observation above and the hidden 115 kb in the second have to be somewhere in the structure of the MailItem object. And (Edit) the hidden e-mails in the third observation must also be hidden somewhere, but I don't see any evidence of MailItem objects still existing for them. I have two questions about all this:

  • Where is this stuff hidden? Or how can I find out where it is?
  • Is there a way to actually remove it? I could trim gigabytes off the size of my pst file if all those attachments (Edit: and e-mails) that I've been removing for years could actually be removed instead of just hidden.

(Edit 2:) Second question

There's something that doesn't make sense in what I wrote above. The first and third observations tell me that Outlook never discards any data -- either deleted attachments or contents of deleted subfolders of Deleted Items, but only hides the data. For me, since I've been keeping almost all my e-mails for over ten years, I haven't been surprised to see the size of my pst file grow to over 10 Gb. And I've always assumed that other people who allow their Deleted Items folder to regularly purge would have much smaller pst files. If that's correct and if my observations are correct, then it must be that:

  • Outlook does discard the data of e-mails purged from Deleted Items.
  • Outlook does not discard, but only hides, deleted attachments and deleted subfolders of Deleted Items.

That would seem like a strange modus operandi. Is that really how Outlook works, or is there something wrong in my reasoning, or maybe something in my settings that is causing Outlook to keep deleted data?

NewSites
  • 1,402
  • 2
  • 11
  • 26
  • Sounds like a product related questions, not programming ones. – Eugene Astafiev May 25 '23 at 19:38
  • @EugeneAstafiev - I've asked a follow-up question in *SuperUser* and would be interested if you have an answer: https://superuser.com/questions/1788571/what-is-the-overhead-in-storage-of-e-mails-in-ms-outlook – NewSites Jun 12 '23 at 02:39

3 Answers3

1

Regarding to your second observation under Hidden data, you could save the 136 kb email (msg file) as HTML which creates a folder with all the files inside an e-mail. You could also save one of the small seemingly identical e-mails as HTML files and see why the difference in size, and what is actually the hidden thing. I'm sorry if this doesn't help much.

Tony Tzu
  • 51
  • 1
  • 5
  • Your suggestion sent me in a direction with interesting results. Saving the two messages as HTML gave me the same result for both: four files with the same file sizes for both messages. But then I saved them as msg, which gave me one file for each, which I opened in Notepad++. After hiding `nul`s, I was able to see differences, mainly that a large block of undecipherable text is much bigger in the larger file. So, that was interesting to see, but I now probably need to look into the tools Dmitry recommended to understand more. – NewSites May 21 '23 at 20:50
  • Interesting indeed, I wasn't expecting equal sized files for both... Regarding to your second inquiry, it keeping size regardless of the fact that you deleted attachments sounds just like Pivot tables keep history unless you uncheck the right check in Pivot Table properties. I'm guessing there may be something for Outlook PST files as well... – Tony Tzu May 22 '23 at 15:30
  • Tony: I've asked a follow-up question in *SuperUser* and would be interested if you have an answer: https://superuser.com/questions/1788571/what-is-the-overhead-in-storage-of-e-mails-in-ms-outlook – NewSites Jun 12 '23 at 02:40
1

Use MFCMAPI (can be a bit overwhelming unless you are an Extended MAPI developer) or OutlookSpy (I am its author, click IMessage button) to look at the message properties on the MAPI level. Pay particular attention to the large binary (PT_BINARY) and string (PT_UNICODE) properties. Also make sure there aren't large attachments (GetAttachmentTable tab).

Dmitry Streblechenko
  • 62,942
  • 4
  • 53
  • 78
  • I'm looking into both those tools. In the meantime, could you take a look at the `Second question` that I've added with an edit to the question and say if you have an answer for that? – NewSites May 21 '23 at 21:17
  • Outlook can compact you PST file (go to the account properties, double click on the PST service, select "Compact Now"), but Outlook can reuse the space inside the PST file for other messages and attachments, I am not sure why you'd want to bother. – Dmitry Streblechenko May 21 '23 at 23:48
  • At the OutlookSpy download page, I clicked on "32 and 64 installer", which downloaded a zip containing "OutlookSpySetup.msi". I ran that and it ended with "Completed the Outlook Spy Setup Wizard". But there's no OutlookSpy in the Start menu and the installation folder contains nothing but two dll files, "OutSpy.dll" ad "OutSpy64.dll". What do I do now? – NewSites May 21 '23 at 23:54
  • 1
    When you then run Outlook, you will have the OutlookSpy ribbon in Outlook. – Dmitry Streblechenko May 22 '23 at 06:57
  • Dmitry: I've asked a follow-up question in *SuperUser* and would be interested if you have an answer: https://superuser.com/questions/1788571/what-is-the-overhead-in-storage-of-e-mails-in-ms-outlook – NewSites Jun 12 '23 at 02:38
0

Answer about observation #2 in the section Hidden data of my question.

Observation #2 is about two e-mails on my computer, which I will call A and B. E-mail A is in the folder Deleted Items and B is in the Inbox. They have the same values of the MailItem properties SenderName, To, Subject, and ReceivedTime. The CreationTime of A is about 1.5 hours after ReceivedTime, but for B, it is May 3, 2023, about a year later. I therefore conclude that A is the original e-mail received, while B is an errant copy made against my will by the bug in Outlook discussed in the Background section of the question.

The property MailItem.Size is 21,679 bytes for A and 136,367 bytes for B, a difference of 114,688 bytes. This difference in size is the "strange observation" I'm talking about here.

Short answer:

The difference in the size of the copy is mostly due to a string at the beginning of the e-mail headers of just that copy, consisting of 57,330 Unicode characters of unknown (if any) meaning, taking up 114,660 bytes, i.e., accounting for all but 28 bytes of the size difference.

Contents of the string:

The string consists of 735 lines of 76 characters (plus CR LF) each. I suspect it's garbage, but in case it's not and in case someone here knows how to interpret it, I am giving you the first and last three lines. I'll be interested to know if someone can attribute any meaning to it:

yFRnGMDvXrYeo/YqPXqdFNX0Zua7b6v4hsbqYyw3Js4jtSNiwKrk8HAzg8fQ1wHgbWo7u1v9MvrW
NJJkP+lIcbUPbHOOma67wTrdjoeoQWEjj7FM3lThjl4Vc4yevQ/pWR4o8LW3g7XrhtKieSwnfzba
djlSsm07MDGcHcPpVRfuSg/kbSPbLDVrmTwAdDt7dY7aNN0mHJOQMDr0444rz3QNBttPhuNS0yPd
...
XPH32enRwTV2kZukfF/WI0fU/GOoafHaRKCVs4Cjlj94A5xlTkHBAzkda77xNqf9i6dE/h1jPfXD
rMAo3Dyzz8y19Y6LF4DtPDFr4d8C+HktvDk0ckcNq1qWlnjOdwVZAZBux35/i96828F/Cfw7rWoa
hZ6zrWpRauoV/wCx9MRbG5ghA+VnMgZ8MMHhl4INfM4mMcTUvA641JRjr0/rU5Cy0u71/SLDV7SS

In case it's relevant to know, the sender is at a "ymail.com" address and the headers include a field X-YMail-OSG, which has a similar structure of 2,410 characters that look meaningless but are apparently a Yahoo anti-spam device [1, 2].

I have not found any match between the contents of the field X-YMail-OSG and the much larger string at the beginning of the headers of the large copy.

How I found it:

I checked out both tools recommended in the answer by Dmitry Streblechenko, MFCMAPI and OutlookSpy. The first was indeed difficult to understand; the second could also benefit from better documentation, but is pretty useful.

In OutlookSpy, I first tried IMessage > Save to File. The resulting file for A was smaller than the size of the e-mail in Outlook, so it definitely did not contain all the data of the e-mail. Then I noticed that in OutlookSpy's window for IMessage, in column "Value", three properties show a value of "MAPI_E_NOT_ENOUGH_MEMORY": PR_BODY_W, PR_RTF_COMPRESSED, and PR_TRANSPORT_MESSAGE_HEADERS_W. The details panel on the right side of the window does show their values when each is selected, but in the saved file, their values also show as "MAPI_E_NOT_ENOUGH_MEMORY". That's why the saved file is smaller than the e-mail, and it's not helpful.

I took a closer look at those three properties. PR_BODY_W is an associated property of MAPI property PidTagBody, which is presumably associated with MailItem.Body. PR_RTF_COMPRESSED is the associated property of MAPI property PidTagRtfCompressed, which is presumably associated with MailItem.RTFbody. I didn't find any differences between A and B in those two properties.

PR_TRANSPORT_MESSAGE_HEADERS_W is an associated property of MAPI property PidTagTransportMessageHeaders. The linked documentation says it's related to "message header information for inbound messages." I don't see any MailItem property that might be associated with this, and apparently, message headers are not included in the Outlook Object Model, which seems strange to me since the headers contain important information about inbound e-mails.

In e-mail A, in OutlookSpy's window for IMessage, with PR_TRANSPORT_MESSAGE_HEADERS_W selected, in the details pane on the right side, the field "Symbol" looks like a set of e-mail headers. The field "Value", when viewed as text, looks like those headers with a space between each character, which suggests that the data are stored as Unicode. So I dropped out of OutlookSpy and got the headers from "Outlook > File > Info > Properties > Internet headers". I did that for both A and B and copied the headers to Notepad++, where I ran the plugin "ComparePlus" on them. There I found that the headers of the two e-mails are identical except for a string of 57,330 characters (in which I saw no meaning or pattern) tacked onto the beginning of the headers of B.

Conclusion:

I suspect that that large string at the beginning of the headers of B is just garbage caused by some sort of corruption in the process of Outlook's bug generating the errant copy, e-mail B.

It's interesting that if it's true that message headers are not included in the Outlook Object Model, then I would never have found that garbage string by looking at the entire contents of the Outlook objects. The only way to find it was to either look at the MAPI property PR_TRANSPORT_MESSAGE_HEADERS_W or to look at the headers of the e-mail in "Outlook > File > Info > Properties > Internet headers".

NewSites
  • 1,402
  • 2
  • 11
  • 26