How to count rows in table join?

Question

I am developing a SQL sproc and I want to return the number of rows for each table. How could I rewrite this statement so that it will list number of rows from each table below?

SELECT COUNT(*)
FROM [test_setup_details_form_view] [tsdf]
JOIN [test_setup_header_form_view] 
    ON [test_setup_header_form_view].[test_setup_header_id] 
        = [tsdf].[test_setup_header_id]
JOIN [test_header_rv] [th] with(nolock) 
    ON [th].[test_setup_header_id] 
        = [test_setup_header_form_view].[test_setup_header_id]
JOIN [test_details_answers_expanded_view] [tdae] 
    ON [tdae].[test_setup_details_id] = [tsdf].[test_setup_details_id] 
        AND [th].[test_header_id] = [tdae].[test_header_id]
JOIN [event_log_rv] [e] 
    ON [e].[event_log_id] = [tdae].[event_log_id]

When I execute this statement, it just gives me the total rows after all of the joins.

By definition, since you have inner joins, there are the same number rows resulting from the join in every single table. Are you just trying to get some kind of report of the table counts in each table, even the rows that aren't involved in the join? — Aaron Bertrand, Jan 18 '13 at 20:29
You're joining everything together. so how else would it work? I think you should edit your question to show, individually, how you'd get the count from each table you ahve in question. e.g. to get the total count from tsdf, do.... and to get the total records from tshfv, do... Then I think the question (and answer) will be more clear. — Eli Gassert, Jan 18 '13 at 20:29
Can't you just add up the count from each table separately? Once you do the joins, it's all aggregated into a single result set; at which point it's too late (or so I'd think). — A-Dubb, Jan 18 '13 at 20:31
Are you trying to get counts of the tables individually? Or are you trying to get counts of the rows used in the joins? — Gordon Linoff, Jan 18 '13 at 20:31
Can you clarify your question so the accepted answer makes *some* sense relative to the problem you have? Right now you are talking about joins and that doesn't seem to be what you are after at all. — Aaron Bertrand, Jan 19 '13 at 00:34

score 3 · Answer 1 · answered Jan 18 '13 at 20:55

If you are trying to just get counts for each of these tables irrespective of the joins:

SELECT
  OBJECT_SCHEMA_NAME([object_id]),
  OBJECT_NAME([object_id]),
  c
FROM
( 
  SELECT [object_id],
    c = SUM(row_count)
  FROM
    sys.dm_db_partition_stats -- no NOLOCK necessary
  WHERE
    index_id IN (0,1)
    AND OBJECT_NAME([object_id]) IN
    ( 
      N'test_setup_details_from_view',
      N'test_setup_header_from_view',
      ... etc etc. ...
    )
  GROUP BY [object_id]
) AS x;

score 1 · Answer 2 · answered Jan 18 '13 at 20:31

1

Use count (distinct <columnname>) on a unique column for each table that you need to count.

answered Jan 18 '13 at 20:31

Aaron Kurtzhals

2,036
3
17
21

How will that help if each table will be filtered down to only the rows that match the join? – Aaron Bertrand Jan 18 '13 at 20:31
@Darwin why? Are you sure everybody understand the OP's requirements? And what if some of the tables have compound primary keys (or no defined or candidate keys)? – Aaron Bertrand Jan 18 '13 at 20:37
Counting the total number of rows in a table is a rather trivial question, so I feel safe assuming that was not the OP's question. Compound primary keys or lack of any candidate key definitely would make the problem more difficult. – Aaron Kurtzhals Jan 18 '13 at 20:53
How do you measure "rather trivial"? Based on whose knowledge and experience? – Aaron Bertrand Jan 18 '13 at 21:06

score -2 · Accepted Answer · answered Jan 18 '13 at 20:31

-2

From each table? Why not use the metadata tables then?

You are trying to do something in code that already exists in the metadata tables:

Select 
    schema_name(schema_id) + '.' + t.name as TableName
,   i.rows
from sys.tables t (nolock) 
    join sys.sysindexes i (nolock) on t.object_id = i.id 
        and i.indid < 2

answered Jan 18 '13 at 20:31

djangojazz

14,131
10
56
94

1

`sys.sysindexes` is a backward compatibility view and I don't believe it is guaranteed to be up to date. In 2005 and above, much safer to use `sys.dm_db_partition_stats` or `sys.partitions`. Also about blindly throwing `NOLOCK` everywhere: http://dba.stackexchange.com/questions/33074/setting-read-uncommitted-when-reading-dmvs/ – Aaron Bertrand Jan 18 '13 at 20:36
Nolock by the article you gave is suggested as a method to use. It prevents blocks, the very reason to use it on meta data tables is if you are querying the very containers carrying your tables data when you are querying them specifically and query their detail and you lock them what does that help you with getting? Why should I be locking things as a default operation on metadata tables? – djangojazz Jan 18 '13 at 20:46
One answer suggested it, the other showed that it makes no difference. The suggestion was most likely due to underlying things that were locking as a side effect, e.g. `has_perms_by_name` and `OBJECTPROPERTY()`. The `NOLOCK` here makes absolutely no difference for this query, and just proliferates the "`NOLOCK` is a magic make-it-go-faster switch." – Aaron Bertrand Jan 18 '13 at 20:49
I don't think that, I just think it is good practice. Do you think when you query objects it is better to just practice a locking serialization level? My argument is more along the lines of 'what is wrong with nolock'? You are saying not to use it and thought enough to state it, why? – djangojazz Jan 18 '13 at 20:51
Because of exactly what I said - I see way too many people suggesting `NOLOCK` as a "best practice" when it is anything but. There are some cases where you may want to play it safe, but you need to fully understand the ramifications. When I see it blindly applied in answers that don't involve questions specifically about locking behavior, yes, I have to say something, sorry. It is not a best practice and never will be. – Aaron Bertrand Jan 18 '13 at 20:57
I didn't catch your whole comment before it was deleted, but let me make two points: (1) While *you* may know how it works and all that, readers of your answer may not, and may take it as one more argument *for* using `NOLOCK` everywhere. (2) Can you please demonstrate evidence that this specific query benefits in any way from the addition of `NOLOCK`? Can you explain why - if it is so important that you use `READ UNCOMMITTED` - you don't at least call out why it is necessary, and perhaps consider using `SET TRANSACTION ISOLATION LEVEL` instead of adding `NOLOCK` to every object? – Aaron Bertrand Jan 18 '13 at 21:24
I would be interested to hear more than 'some cases'. The some cases I have found are only when performing creation, updates, and billing reporting. Other than that most industries I have worked in wanted (nolock)s set when having large SQL Servers with multiple connections coming in and hitting similar tables. I want to know 'some instances' when locking is better than no locks for queries. – djangojazz Jan 18 '13 at 21:43
Honestly when you do this: Set transaction isolation level read uncommitted and then some code, most people miss it or don't copy it. The (nolock) are embedded in the code so even if someone takes part of the code they are forced to use it unless they removed. I like talking about this kind of stuff though. Honestly I have never spent a ton of time on isolation levels beyond snapshot, uncommitted, and committed. I understand what they do in theory but only in very high performance systems where you are tweaking by the second are you really setting levels to keep things out. – djangojazz Jan 18 '13 at 21:49
Right, and when you want to REMOVE the `NOLOCK` because you realize it is wrong, leads to bad data, etc., you have to go remove it from every single table. If the question wasn't *about* the isolation level, why do you care if they take the isolation level with them or not? – Aaron Bertrand Jan 18 '13 at 21:50
Jonathan talks about [one case here](http://www.sqlskills.com/blogs/jonathan/finding-what-queries-in-the-plan-cache-use-a-specific-index/) and it is because he explicitly does not care about a phantom read (and I haven't even validated that his concerns are realistic, given that for several other DMVs the `NOLOCK` is just wasted typing). So if you want to thoroughly explain to users exactly what the potential consequences are, then go ahead and recommend `NOLOCK` as long as it comes with that explanation. If you can't explain it, then you probably shouldn't be propagating it as a best practice – Aaron Bertrand Jan 18 '13 at 21:52
You have not proved that an uncommitted isolation level is bad, you have proved you don't like it thus far. Let me ask you this what is better for a database that possibility of locks by people querying data or the possibility of them getting dirty data? I value the database not getting locked more than some edge case I have never ran into but on paper it could happen. – djangojazz Jan 18 '13 at 22:08
Again, can you please demonstrate with an example how your query prevents locks from user queries by adding `NOLOCK`? Paranoia about locking does not make locking transpire, and applying `NOLOCK` everywhere is not useful if it is only relevant in a few isolated cases. – Aaron Bertrand Jan 18 '13 at 22:09
2

Maybe give these two a read. A lot of the time, you WANT locks because that's what guarantees you are displaying consistent data. When you don't care about consistency should be few and far between if you are using SQL Server. I don't think it's necessary to draw out all of `NOLOCK`'s cons in a comment thread just to satisfy you. You are recommending it as a best practice; YOU defend it. http://stackoverflow.com/questions/686724/sql-when-should-you-use-with-nolock http://stackoverflow.com/questions/1452996/is-the-nolock-sql-server-hint-bad-practice – Aaron Bertrand Jan 18 '13 at 22:17
"First do no harm" is actually not the primary concern of an MD. That's a myth. – swasheck Jan 18 '13 at 22:18
"The question is what is worse, a deadlock or a wrong value? For finanical databases, deadlocks are far worse than wrong values" My point exactly. You are defending locking for reasons you just list threads that keep agreeing with me more than you. I don't get it at all. – djangojazz Jan 18 '13 at 22:20
Can you demonstrate a deadlock using the query in your answer after removing `NOLOCK`? Again, this sounds like paranoia and it's not healthy to walk around trying to convince every user that they should be using `NOLOCK` on every query simply because "locks are bad." Also I have no idea why you think the threads I'm posting agree with you and not with me. Did you actually ***READ THEM***? – Aaron Bertrand Jan 18 '13 at 22:21
1

Also, do you really believe that it's ok to show a customer that his balance is $20,000 instead of $10,000 because you used `NOLOCK` to prevent blocking? (Also, can you show me an example where `NOLOCK` prevents a deadlock that would happen without it?) – Aaron Bertrand Jan 18 '13 at 22:26
This might be a more useful answer for you to focus on, for example: http://stackoverflow.com/a/5469238/61305 And again, did you read [this answer](http://dba.stackexchange.com/a/33076/1186), where Martin demonstrates that `NOLOCK` *has no effect*? – Aaron Bertrand Jan 18 '13 at 22:36
I am not talking on transactional reporting for accounting being accurate. If that were the case I would argue why are you not using snapshot isolation level or using a snapshot of a database or else a reporting repository. If you want accurate data, you design a system for that. If you don't want to PREVENT LOCKS in committing transactions because someone is selecting a million rows, then set nolock. I read your articles did you see I quoted parts of them? – djangojazz Jan 18 '13 at 22:42
yep, you quoted parts of them that support your "you should use `NOLOCK` everywhere" argument. You seem to have skipped over the parts that don't, and my repeated requests for proof that `NOLOCK` is helpful or solves anything in the query above (or probably many other queries where you are using it but aren't clear on all of the potential ramifications). Just because other people have some misconceptions about how this works (or don't realize how harmful it can be) does not mean you should promote it as a best practice too. – Aaron Bertrand Jan 18 '13 at 22:56
It is far more harmful to lock a transactional production database than present dirty data. Yes I know there are financial reporting numbers that can be inaccurate. Why would you be reporting off of live data anyways for critical reporting? I have QA'd and user tested reports that do that from a far removed from production database doing that. What if someone updates a bill later and removes or updates it before end of day? Well the data I present at 11:00 AM must be completely accurate because I used read committed. I don't believe the financial accuracy outweighs locking entire tables. – djangojazz Jan 18 '13 at 23:29
There are ways to avoid "locking entire tables" without slapping NOLOCK on every table, and guess what, you can still have accuracy, too. Also, when did I utter anything about reporting? I'll ask again, can you demonstrate how NOLOCK helps THIS query? – Aaron Bertrand Jan 19 '13 at 00:32
1

It doesn't help the query speed where did I say that? Using no lock is like using a seat belt, 99% of the time you won't need it at all. However there is a time sometimes when you are selecting millions of rows and something is trying to insert into that same table. I would rather data be able to insert and update then to lock tables because I want real data that committed in production. It is a setting that you can run a little hotter and more dangerous with your timings in you are protected against locks. It is not required, it is a choice knowing what it does. Why does it bother you? – djangojazz Jan 19 '13 at 00:42
1

Because the more often it appears in answers (especially *accepted* answers), the more people will take it as gospel that this is the right way to do it all the time. I'm not sure what industries you work in, but in 99% of the scenarios I have come across in my career, dirty reads were NOT okay. Again, YOU may understand the consequences (or you may not), but it's not fair to make the assumption that all of your readers do. In the end it's just more consulting money for someone when they come in to fix it, because if you're using it to avoid locks, it's broken. – Aaron Bertrand Jan 19 '13 at 01:08
1

This discussion is growing too long and hard to follow. However it contains good information which should be integrated into the post. Please do that and continue the discussion in the chat! – markus Jan 19 '13 at 14:18

How to count rows in table join?

3 Answers3