Best way to check if multiple records exist in database

Question

I am creating an application that takes data from a text file which has sales data from Amazon market place.The market place has items with different names compared to the data in our main database. The application accepts the text file as input and it needs to check if the item exists in our database. If not present I should throw an option to save the item to a Master table or to Sub item table and map it to a master item. My question is if the text file has 100+ items should I hit the database each time to check if the data exists there.Is there any better way of doing that so that we can minimize the database hits.

I have two options that i have used earlier

Hit database and check if it exists in table.
Fill the data in a DataTable and use DataTable.Select to check if it exists.

Can some one tell me the best way to do this?. I have to check two tables (master table, subItem table), maybe 1 at a time. Thanks.

Update:

@Downvoters add an comment .

i am not asking you whats the way to check if an item exists in database.I just want to know the best way of doing that. Should I be hitting database 1000 times if an file has 1000 items? That's my question.

The current query I use:

if exists (select * from [table] where itemname= [itemname] ) 
select 'True'  
else 
select 'False' 
return

@user2864740 what if its 1000 records or 2000? i have added the ways i know to do that don't think if exists and datatable.select needs code to be shown here.I just wanted to know if a database select is a good way to do that — Karthik, Dec 29 '13 at 07:48
There is no code or actual case with query/schema. Thus there is no problem: 'Minimizing database "hits" it achieved by appropriately aggregating queries - where it matters.' Because databases (normally) return normalized data there is some trade-off point (between hyper-normalization and multiple queries), but this requires an appropriate context. Unless there is a problem, just write it so it works and is maintaintable; then when there *is* a problem, focus on it. (And the solution might even reside outside the database.) — user2864740, Dec 29 '13 at 07:48
(In the updated case, I would probably use an `IN` with one query [which is easy with a provider like L2S or EF]: it will work fine for several hundred values but needs to be broken up for thousands due to SQL Server limits. The biggest issue, assuming a low-latency connection, is ultimately using applicable correct indexes.) — user2864740, Dec 29 '13 at 07:57
@user2864740 added the query..My question is should i execute it 100 times for 100 items — Karthik, Dec 29 '13 at 07:58
@Karthik SQL Server has limits on how many parameters which can be bound in a query. There are also some (not sure which version, though) "performance steppes" for the number of values used with `IN`. I would KISS - for advanced cases there are additional tricks that can be employed. — user2864740, Dec 29 '13 at 08:01
@user2864740 Great thanks.Can you throw me an example or link Based on EF or L2S..Would be of great help..I just want to make sure the application is scalable — Karthik, Dec 29 '13 at 08:02
@Karthik [Local] Scalability comes with indices and limiting the amount of data fetched during each operation - i.e. SARGABLE queries and paging. L2S/EF are Linq 2 SQL (old) and Entity Framework (EF) which are just two LINQ providers (e.g. support `Contains` which map to `IN`): LINQ is often easier than DataTables/DataSets for queries that have a fixed (non-dynamic) shape. — user2864740, Dec 29 '13 at 08:03

score 1 · Answer 1 · answered Dec 29 '13 at 07:43

Make it as batch of 100 items to the database. probably a stored procedure might help, since repetitive queries has to be fired. If the data is not changed frequently, you can consider caching. I assume you will be making service calls from ur .net application, so ingest a xml from back end, in batches. Consider increasing batch size based on the filesize.

If your entire application is local, batch size size may very high, as there is no netowrk oberhead, still dont make 100 calls to db.

Thanks thats what i was looking for.Let me check batching.Dont have much idea about caching :( — Karthik, Dec 29 '13 at 07:51

score 1 · Answer 2 · answered Dec 29 '13 at 08:06

1

Try like this

SELECT EXISTS(SELECT * FROM table1 WHERE itemname= [itemname])

SELECT EXISTS(SELECT 1 FROM table1 WHERE itemname= [itemname])

answered Dec 29 '13 at 08:06

Vignesh Kumar A

27,863
13
63
115

am not asking for a query.I already have that i just want to know the best way of doing it for 100 or 100o items without much database hit – Karthik Dec 29 '13 at 08:17

score 1 · Accepted Answer · edited Jan 18 '21 at 12:35

(From Chat)

I would create a Stored Procedure which takes a table valued parameter of all the items that you want to check. You can then use a join (a couple of options here)* to return a result set of items and whether each one exists or not. You can use TVP's from ADO like this.

It will certainly handle the 100 to 1000 row range mentioned in your post. To be honest, I haven't used it in the 1M+ range. in newer versions of SQL Server, I would prefer TVP's over using an xml input parameter, as it is really quite cumbersome to pack the xml in your .Net code and then unpack it again in your SPROC.

(*) Re Joins : With the result set, you can either just inner join the TVP to your items / product table and check in .Net if the row doesn't exist, or you can do an left outer join with the TVP as the left table, and e.g. ISNULL() missing items to 0 / 'false' etc.

Best way to check if multiple records exist in database

3 Answers3