7

issue

  1. Outlook 2016 corrupted while I was moving items from an online archive into a pst file.
  2. The PST file has been recovered .... but many items (~7000) are duplicated 5 times
  3. There are a range of item types, standard messages, meeting requests etc

what I tried
I looked at existing solutions and tools, including:

  1. duplicate removal tools - none of which were free other than a trial option to remove 10 items at a time.
  2. A variety of code solutions including:
    Jacob Hilderbrand's effort which runs from Excel
    Macro in Outlook to delete duplicate emails-

I decided to go the code route as it was relatively simple and to gain more control over how the duplicates were reported.

I will post my self solution below as it may help others.

I would like to see other potential approaches (perhaps powershell) to fixing this problem which may be better than mine.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
brettdj
  • 54,857
  • 16
  • 114
  • 177
  • 1
    Here is an updated link for the Jacob Hilderbrand article: http://www.excelandaccess.com/delete-duplicate-emails-via-ms-excel-vba/ – dan-O May 18 '17 at 16:46

4 Answers4

14

The approach below:

  1. Provides users with a prompt to select the folder to process
  2. Checks duplicates on the base of Subject, Sender, CreationTime and Size
  3. Moved (rather than delete) any duplicates into a sub-folder (removed items) of the folder being processed.
  4. Create a CSV file - stored under the path in StrPath to create a external reference to Outlook of the emails that have been moved.

Updated: Checking for size surprisingly missed a number of dupes, even for otherwise identical mail items. I have changed the test to subject and body

Tested on Outlook 2016

Const strPath = "c:\temp\deleted msg.csv"
Sub DeleteDuplicateEmails()

Dim lngCnt As Long
Dim objMail As Object
Dim objFSO As Object
Dim objTF As Object

Dim objDic As Object
Dim objItem As Object
Dim olApp As Outlook.Application
Dim olNS As NameSpace
Dim olFolder As Folder
Dim olFolder2 As Folder
Dim strCheck As String

Set objDic = CreateObject("scripting.dictionary")
Set objFSO = CreateObject("scripting.filesystemobject")
Set objTF = objFSO.CreateTextFile(strPath)
objTF.WriteLine "Subject"

Set olApp = Outlook.Application
Set olNS = olApp.GetNamespace("MAPI")
Set olFolder = olNS.PickFolder

If olFolder Is Nothing Then Exit Sub

On Error Resume Next
Set olFolder2 = olFolder.Folders("removed items")
On Error GoTo 0

If olFolder2 Is Nothing Then Set olFolder2 = olFolder.Folders.Add("removed items")


For lngCnt = olFolder.Items.Count To 1 Step -1

Set objItem = olFolder.Items(lngCnt)

strCheck = objItem.Subject & "," & objItem.Body & ","
strCheck = Replace(strCheck, ", ", Chr(32))

    If objDic.Exists(strCheck) Then
       objItem.Move olFolder2
       objTF.WriteLine Replace(objItem.Subject, ", ", Chr(32))
    Else
        objDic.Add strCheck, True
    End If
Next

If objTF.Line > 2 Then
    MsgBox "duplicate items were removed to ""removed items""", vbCritical, "See " & strPath & " for details"
Else
    MsgBox "No duplicates found"
End If
End Sub
brettdj
  • 54,857
  • 16
  • 114
  • 177
  • Updated test to look at subject and body only. – brettdj Feb 18 '16 at 04:43
  • Thank you very much for this code!! Wish I had found it two days ago!! A note for anyone that uses this in the future: in my case, I had about 25 folders to process, so was too lazy to alter the code to list folders and let user pick 'yes/no'. Two strange things: (1) At some point as I deleted the new folders, I got a cryptic '...permissions...' msg (not from this code), but after I deleted all from 'Trash', issue went away. (2) The Gremlin inside the PC *sometimes* thought that I didn't want to delete the 'removed..' folder and it would reappear! – Wayne G. Dunn Apr 27 '17 at 20:45
  • This is FANTASTIC. I made little tweaks here and there as it processed thousands of emails. Helped a lot. Thanks and props! – Button 108 Aug 23 '17 at 20:43
  • Just wanted to point out that this also removes HTML mails with duplicate subjects but different (HTML-)content. You need to check the HTMLBody, too, for those - but that property only exists for actual MailItems, not for e.g. Calendar items. – mbirth Sep 29 '17 at 12:11
  • I have Outlook 2016, but it's on Office 365. I still see duplicates and the count is way lower than what I have in the folder. – Rick Mar 25 '20 at 20:52
1

Here's a script that takes advantage of sorting emails to check for duplicates much more efficiently.

There's no need to maintain a giant dictionary of every email you've seen if you are processing emails in a deterministic order (e.g. received date). Once the date changes, you know you'll never see another email with the prior date, therefore, they won't be duplicates, so you can clear your dictionary on each date change.

This script also takes into account the fact that some items use an HTMLBody for the full message definition, and others don't have that property.

Sub DeleteDuplicateEmails()
    Dim allMails As Outlook.Items
    Dim objMail As Object, objDic As Object, objLastMail As Object
    Dim olFolder As Folder, olDuplicatesFolder As Folder
    Dim strCheck As String
    Dim received As Date, lastReceived As Date        

    Set objDic = CreateObject("scripting.dictionary")
    With Outlook.Application.GetNamespace("MAPI")
        Set olFolder = .PickFolder
    End With
    If olFolder Is Nothing Then Exit Sub

    On Error Resume Next
    Set olDuplicatesFolder = olFolder.Folders("Duplicates")
    On Error GoTo 0
    If olDuplicatesFolder Is Nothing Then Set olDuplicatesFolder = olFolder.Folders.Add("Duplicates")

    Debug.Print "Sorting " & olFolder.Name & " by ReceivedTime..."
    Set allMails = olFolder.Items
    allMails.Sort "[ReceivedTime]", True
    Dim totalCount As Long, index As Long
    totalCount = allMails.count
    Debug.Print totalCount & " Items to Process..."

    lastReceived = "1/1/1987"
    For index = totalCount - 1 To 1 Step -1
        Set objMail = allMails(index)
        received = objMail.ReceivedTime
        If received < lastReceived Then
            Debug.Print "Error: Expected emails to be in order of date recieved. Previous mail was " & lastReceived _
                & " current is " & received
            Exit Sub
        ElseIf received = lastReceived Then
            ' Might be a duplicate track mail contents until this recieved time changes.
            ' Add the last mail to the dictionary if it hasn't been tracked yet
            If Not objLastMail Is Nothing Then
                Debug.Print "Found multiple emais recieved at " & lastReceived & ", checking for duplicates..."
                objDic.Add GetMailKey(objLastMail), True
            End If
            ' Now check the current mail item to see if it's a duplicate
            strCheck = GetMailKey(objMail)
            If objDic.Exists(strCheck) Then
                Debug.Print "Found Duplicate: """ & objMail.Subject & " on " & lastReceived
                objMail.Move olDuplicatesFolder
                DoEvents
            Else
                objDic.Add strCheck, True
            End If
            ' No need to track the last mail, since we have it in the dictionary
            Set objLastMail = Nothing
        Else
            ' This can't be a duplicate, it has a different date, reset our dictionary
            objDic.RemoveAll
            lastReceived = received
            ' Keep track of this mail in case we end up needing to build a dictionary
            Set objLastMail = objMail
        End If

        ' Progress update
        If index Mod 10 = 0 Then
            Debug.Print index & " Remaining..."
        End If
        DoEvents
    Next
    Debug.Print "Finished moving Duplicate Emails"
End Sub

And the helper function referenced above for "uniquely identifying" an email. Adapt as needed, but I think if the subject and full body are the same, there's no point in checking anything else. Also works for calendar invites, etc.:

Function GetMailKey(ByRef objMail As Object) As String
    On Error GoTo NoHTML
    GetMailKey = objMail.Subject & objMail.HTMLBody
    Exit Function
BodyKey:
    On Error GoTo 0
    GetMailKey = objMail.Subject & objMail.Body
    Exit Function
NoHTML:
    Err.Clear
    Resume BodyKey
End Function
Alain
  • 26,663
  • 20
  • 114
  • 184
  • I am using it in Outlook 2016, with the data on Office365. I am getting a way lower count of emails in my folder, than what I actually have. I get an object reference error message on some items on objMail.ReceivedTime. – Rick Mar 25 '20 at 21:03
1

I've wrote a VBA script called "Outlook Duplicated Items Remover"

The source code is available on GitHub

It will find all duplicated items in a folder and its subfolders and move them to a dedicated folder

Arik
  • 5,266
  • 1
  • 27
  • 26
0

I simplified the duplicate search as in my case I imported multiple duplicates from PST files but the full mail body didn't match. I don't know why, as I am sure those mail are true duplicates.

My simplification is to match ONLY the receive TIME STAMP and the SUBJECT.

I added an error exception for an error I got some times on the function: Set olDuplicatesFolder = olFolder.Folders("Duplicates").
I did a different format for the debug.print messages.

Attribute VB_Name = "DelDupEmails_DATE_SUBJECT"
Sub DeleteDuplicateEmails_DATE_SUBJECT()
Dim allMails As Outlook.Items
Dim objMail As Object, objDic As Object, objLastMail As Object
Dim olFolder As Folder, olDuplicatesFolder As Folder
Dim strCheck As String
Dim received As Date, lastReceived As Date

Set objDic = CreateObject("scripting.dictionary")
With Outlook.Application.GetNamespace("MAPI")
    Set olFolder = .PickFolder
End With
If olFolder Is Nothing Then Exit Sub

On Error Resume Next
Set olDuplicatesFolder = olFolder.Folders("Duplicates")
On Error GoTo 0
If olDuplicatesFolder Is Nothing Then Set olDuplicatesFolder = olFolder.Folders.Add("Duplicates")

Debug.Print "Sorting " & olFolder.Name & " by ReceivedTime..."
Set allMails = olFolder.Items
allMails.Sort "[ReceivedTime]", True
Dim totalCount As Long, index As Long
totalCount = allMails.Count
Debug.Print totalCount & " Items to Process..."
'MsgBox totalCount & " Items to Process..."

lastReceived = "1/1/1987"
For index = totalCount - 1 To 1 Step -1
    Set objMail = allMails(index)
    On Error Resume Next
    received = objMail.ReceivedTime
    On Error GoTo 0
    If received < lastReceived Then
        Debug.Print "Error: Expected emails to be in order of date recieved. Previous mail was " & lastReceived _
            & " current is " & received
        Exit Sub
    ElseIf received = lastReceived Then
        ' Might be a duplicate track mail contents until this recieved time changes.
        ' Add the last mail to the dictionary if it hasn't been tracked yet
        If Not objLastMail Is Nothing Then
            Debug.Print olFolder & " : Found multiple emails recieved at " & lastReceived & ", checking for duplicates..."
            'MsgBox "Found multiple emails recieved at " & lastReceived & ", checking for duplicates..."
            objDic.Add GetMailKey(objLastMail), True
        End If
        ' Now check the current mail item to see if it's a duplicate
        strCheck = GetMailKey(objMail)
        If objDic.Exists(strCheck) Then
            Debug.Print "#" & index & " - Duplicate: " & lastReceived & " " & objMail.Subject
            'Debug.Print "Found Duplicate: """ & objMail.Subject & " on " & lastReceived
            'MsgBox "Found Duplicate: """ & objMail.Subject & " on " & lastReceived
            objMail.Move olDuplicatesFolder
            DoEvents
        Else
            objDic.Add strCheck, True
        End If
        ' No need to track the last mail, since we have it in the dictionary
        Set objLastMail = Nothing
    Else
        ' This can't be a duplicate, it has a different date, reset our dictionary
        objDic.RemoveAll
        lastReceived = received
        ' Keep track of this mail in case we end up needing to build a dictionary
        Set objLastMail = objMail
    End If

    ' Progress update
    If index Mod 100 = 0 Then
        Debug.Print index & " Remaining... from " & olFolder
        'MsgBox index & " Remaining..."
    End If
    DoEvents
Next
Debug.Print "Finished moving Duplicate Emails"
MsgBox "Finished moving Duplicate Emails"

End Sub

Function GetMailKey(ByRef objMail As Object) As String
  On Error GoTo NoHTML
  'GetMailKey = objMail.Subject & objMail.HTMLBody
  GetMailKey = objMail.Subject ' & objMail.HTMLBody
  Exit Function
BodyKey:
  On Error GoTo 0
  'GetMailKey = objMail.Subject & objMail.Body
  GetMailKey = objMail.Subject ' & objMail.Body
  Exit Function
NoHTML:
  Err.Clear
  Resume BodyKey
End Function
Community
  • 1
  • 1