I am looking for help on how to speed up the code bit below because as it stands, it is taking too long to perform the task. Any suggestions would be much appreciated. Thanks in advance!
The code bit below is a stripped down version of the actual version but all the important guts should be there. The code works; however, the code is really slow on even a modest size dataset. Needless to say, the primary culprit is the second, nested recordset/SQL call. The LIKE operator is part of the slowdown but I'm more concerned about the nesting and I think the LIKE operator will be required in what we're trying to accomplish. I tried nesting the second SQL call into the first but I didn't see a clean way of doing so.
Platform: Classic ASP, VBScript, MS Access DB
' Go through all people in the table.
sql1 = "SELECT ID, FN, LN, Email FROM Table1"
Call rst1.Open(sql1, cnx, 0, 1)
While Not rst1.EOF
id = rst1.Fields("ID").Value
fn = rst1.Fields("FN").Value
ln = rst1.Fields("LN").Value
email = rst1.Fields("Email").Value
If IsNull(email) Or IsEmpty(email) Then
email = ""
End If
' ----- Figure out if any other people in the table has a similar name or is using the same e-mail address.
' Capture both the ID of those other people as well as figure out the total number of possible duplicates.
sql2 = "SELECT ID FROM Table1"
sql2 = sql2 & " WHERE"
sql2 = sql2 & " ID <> " & id
sql2 = sql2 & " AND"
sql2 = sql2 & " ("
sql2 = sql2 & " FN & ' ' & LN LIKE '%" & Replace(fn & " " & ln, "'", "''") & "%'"
If email <> "" Then
sql2 = sql2 & " OR"
sql2 = sql2 & " Email LIKE '%" & Replace(email, "'", "''") & "%'"
End If
sql2 = sql2 & " )"
Call rst2.Open(sql2, cnx, 0, 1)
numDups = 0
possibleDups = ""
While Not rst2.EOF
numDups = numDups + 1
If possibleDups <> "" Then
possibleDups = possibleDups & ", "
End If
possibleDups = possibleDups & rst2.Fields("ID").Value
Call rst2.MoveNext()
Wend
Call rst2.Close()
' ----- End nest query.
Call Response.Write(fn & " " & ln & " has " & numDups & " possible duplicates (" & possibleDups & ")")
Call rst1.MoveNext()
Wend
Call rst1.Close()
Update 1:
Per request, here is a bit more info on the sample data and the expected output. Table1 is basically a table with the fields: id, fn, ln, email. id is an autogenerated ID representing the entry and fn/ln represent the first/last name, respectively, of the person's entry. Expected output is as coded, e.g.,...
John Doe has 3 possible duplicates (1342, 3652, 98325)
John Doe has 3 possible duplicates (986, 3652, 98325)
John Doe has 3 possible duplicates (986, 1342, 98325)
John Doe has 3 possible duplicates (986, 1342, 3652)
Sam Jones has 0 possible duplicates ()
Jane Smith has 2 possible duplicates (234, 10562)
Jane Smith has 2 possible duplicates (155, 10562)
Jane Smith has 2 possible duplicates (155, 234)
The numbers in parentheses correspond to the id's that appear to be duplicates to each person. A possible duplicate is a scenario in which another entry in the same table appears to share the same name or e-mail. For example, there could be 4 John Doe's and 3 Jane Smith's in the table based on name alone.
Ideally, only one SQL query is required to reduce the roundtrip induced by the recordset call but Access is limited compared to regular SQL Server as far as features and I'm not sure what I'm missing that might help speed this up.
Update 2:
Using the SQL Fiddle by @Abecee, I was able to get a faster query. However, I am now encountering two problems as a result.
The big picture view is still the same. We are looking for possible duplicates based on first name, last name, and e-mail address. However, we also added a search criteria, which are the lines wrapped inside of If searchstring <> "" Then ... End If
. Also, note that the e-mail info is now being pulled from a separate table called EmailTable with the fields id, IndividualID (representing Table1.id), and email.
Mods: The updated query is similar but slightly different from the original query above. I'm not sure if it's better to create a whole new question or not, so I'll just leave this here for now. Let me know if I should move this to its own question.
If the code associated with comment A below is uncommented sql1 = sql1 & " OR (INSTR(E1.Email, E2.Email) > 0) "
, I get an error message: Microsoft JET Database Engine (0x80040E14) Join expression not supported.
The query seems to be coded correctly so what is missing or incorrect?
If the code associated with comment B below is uncommented sql1 = sql1 & " OR INSTR(E1.Email, '" & Replace(searchstring, "'", "''") & "') > 0"
, the query runs but it hangs. I tried dropping the query directly into Access to see if it'll work (e.g., New Query > SQL View) but it also hangs from within Access. I think the syntax and logic are correct but obviously something is askew. Do you see what or why it would hang with this line of code?
Here is the updated query:
sql1 = sql1 & "SELECT "
sql1 = sql1 & " T1.ID, T1.FN, T1.LN, E1.Email, "
sql1 = sql1 & " T2.ID, T2.FN, T2.LN "
sql1 = sql1 & "FROM "
sql1 = sql1 & " ((Table1 T1 LEFT JOIN [SELECT E1.* FROM EmailTable E1 WHERE E1.Primary = True]. AS E1 ON T1.ID = E1.IndividualID)"
sql1 = sql1 & " LEFT JOIN (Table1 T2 LEFT JOIN EmailTable E2 ON T2.ID = E2.IndividualID) "
sql1 = sql1 & " ON "
sql1 = sql1 & " ("
sql1 = sql1 & " T1.ID <> T2.ID "
sql1 = sql1 & " AND "
sql1 = sql1 & " ("
sql1 = sql1 & " ((INSTR(T1.FN, T2.FN) > 0) AND (INSTR(T1.LN, T2.LN) > 0)) "
' A. When the following line is uncommented, error is "Join expression not supported."
' sql1 = sql1 & " OR (INSTR(E1.Email, E2.Email) > 0) "
sql1 = sql1 & " ) "
sql1 = sql1 & " ) "
sql1 = sql1 & " ) "
If searchstring <> "" Then
sql1 = sql1 & " WHERE "
sql1 = sql1 & " INSTR(T1.FN & ' ' & T1.LN, '" & Replace(searchstring, "'", "''") & "') > 0"
' B. When the following line is uncommented, code hangs on the rst1.open() call."
' sql1 = sql1 & " OR INSTR(E1.Email, '" & Replace(searchstring, "'", "''") & "') > 0"
End If
sql1 = sql1 & " ORDER BY T1.LN, T1.FN, T1.ID"
prevID = 0
Call rst1.Open(sql1, cnx, 0, 1)
While Not rst1.EOF
id = rst1.Fields("ID").Value
' Get initial values if we've come across a new ID.
If (id <> prevID) Then
fn = rst1.Fields("T1.FN").Value
ln = rst1.Fields("T1.LN").Value
email = rst1.Fields("Email").Value
If IsNull(email) Or IsEmpty(email) Then
email = ""
End If
' Reset the counter for how many possible duplicates there are.
numDups = 0
' If there is an ID from the second table, then keep track of this possible duplicate.
tmp = rst1.Fields("T2.ID").Value
If IsNumeric(tmp) Then
tmp = CLng(tmp)
Else
tmp = 0
End If
If tmp > 0 Then
numDups = numDups + 1
possibleDups = possibleDups & tmp
End If
End If
' Figure out if we should show this row. Within this logic, we'll also see if there is another possible duplicate.
showrow = False
Call rst1.MoveNext()
If rst1.EOF Then
' Already at the end of the recordset so show this row.
showrow = True
Call rst1.MovePrevious()
Else
If rst1.Fields("T1.ID") <> lngIndividualIDCurrent Then
' Next record is different T1, so show this row.
showrow = True
Call rst1.MovePrevious()
Else
' Next record is the same T1, so don't show this row but note the duplicate.
Call rst1.MovePrevious()
' Also, add the new T2 as a possible duplicate.
tmp = rst1.Fields("T2.ID").Value
If IsNumeric(tmp) Then
tmp = CLng(tmp)
Else
tmp = 0
End If
If tmp > 0 Then
numDups = numDups + 1
If possibleDups <> "" Then
possibleDups = possibleDups & ", "
End If
possibleDups = possibleDups & tmp
End If
End If
End If
If showrow Then
Call Response.Write(fn & " " & ln & " has " & numDups & " possible duplicates (" & possibleDups & ")")
End If
Call rst1.MoveNext()
prevID = id
Wend
Call rst1.Close()