0

New to databases and am can read vb.net or c# code. I want to match filenames in a folder to a master list of those filenames with their descriptions and create a subset to display in a datagridview control. "Filename" is a primary key in both tables:

Table "MasterList" <-- 50,000 records
  Filename 
  Title
 Table "FoundFiles" <-- 2,000 records
  Filename 

After weeks of reading, I came up with the following and both SQL queries "seem" to give the same result. Am I doing it correctly and is one query "better" than another? I am also confused about the order of the tables in the query. Which is "left" and which is "right" and if I have them correct in the "JOIN"

Dim con As New SqlCeConnection("Data Source=|DataDirectory|\data.sdf")
Dim daMasterList As SqlCeDataAdapter
Dim daSubset As SqlCeDataAdapter
Dim ds As DataSet 

Sub table_load(database As String)
  ds = New DataSet
  daMasterList = New SqlCeDataAdapter(("SELECT * FROM MasterList"), con)
  daMasterList.Fill(ds, "MasterList")
  daSubset = New SqlCeDataAdapter(("SELECT FoundFiles.Filename, MasterList.Title"                                         & _" FROM FoundFiles" & _
                                    " INNER JOIN MasterList" & _
                                    " ON FoundFiles.Filename = Masterlist.Filename"), con)
  daSubset.Fill(ds, "FoundFiles")
  ' load the table into a DataGridView control
  dgvFileList.DataSource = ds.Tables("FoundFiles") 'existing files with titles from the master list
End Sub

This also gives me what I am looking for:

daSubset = New SqlCeDataAdapter(("SELECT * FROM MasterList" & _ 
                                 " INNER JOIN FoundFiles" 
                                 " ON MasterList.Filename = FoundFiles.Filename"), con)

wanted result:

  FoundFiles              MasterList              reduced master or new table
filename1.zip      filename1.zip, accounting       filename1.zip, accounting 
filename2.zip -->  filename2.zip, birth certs  --> filename2.zip, birth certs
filename5.zip      filename3.zip, spreadsheets     filename5.zip, websites
filename8.zip      filename4.zip, real estate      filename8.zip, archived
....               filename5.zip, websites
                   filename6.zip, games
                   filename7.zip, presentations
                   filename8.zip, archived
FDecker
  • 41
  • 1
  • 10

2 Answers2

2

If I understand your question correctly, it pretty much boils down to "What is the difference between these two SQL Queries?"

SELECT FoundFiles.Filename, MasterList.Title
FROM FoundFiles
INNER JOIN MasterList
ON FoundFiles.Filename = Masterlist.Filename

SELECT * 
FROM MasterList 
INNER JOIN FoundFiles
ON MasterList.Filename = FoundFiles.Filename

The first difference is that in the first query, you explicitly state which columns you want returned: FoundFiles.Filename, MasterList.Title but in the second you use * which means "give me all the columns." If those are the only columns that exist, you get the same result.

The second difference is the order of the tables, one has FROM A JOIN B and the other has FROM B JOIN A. The type of join you are doing is an inner join, which means it only returns results that are found in both tables. In this case, the order of the join doesn't really matter. As for the question of which is "left" and "right", it follows the pattern FROM left JOIN right. Again, since you're getting only results that exist in both, the order doesn't change the result.

As for which is "correct", they're both fully syntactically correct. The first version where you explicitly list the columns you want is probably the slightly better option. It clearly states what you expect to get as a result, which can help with optimization and code readability. You might want to search around for information about best practices for SQL queries. Here is a Stack Overflow discussion on use of * in queries.

DavidP
  • 613
  • 5
  • 12
  • Perfect! I don't know why I didn't see this before. The link to "why select is considered harmful" was also very helpful. Thank you to both of you. – FDecker Aug 02 '18 at 03:10
1

There isn't a difference between the 2 SQL statements that you have written with the exception, first SQL limits the columns to wanted columns (recommended) while second includes all columns from both tables (even the common fileName columns come from both tables and thus duplicated). First one is better.

As per the LEFT and RIGHT simply check which one is on the Left or Right:

     LEFT                  RIGHT
FROM MasterList INNER JOIN FoundFiles

In your query though, you don't want LEFT or RIGHT, you want INNER JOIN as you did. That means "give me the rows that are found on both of the tables - based on matching key which is FileName in your case".

If say you want to have all files from MasterList, regardless they are also in foundFiles or not then you would use a LEFT [OUTER] JOIN:

FROM MasterList LEFT JOIN FoundFiles

If masterfiles used on the right, then it would be a RIGHT JOIN (rarely used).

Cetin Basoz
  • 22,495
  • 3
  • 31
  • 39