0

I support a database that contains a schema that has a couple hundred tables containing our most important data.

Our application also offers APIs implemented as queries stored in NVARCHAR(MAX) fields in a Query table which are written against the views as well as the tables in this critical schema.

Over time, columns have been added to the tables, but the APIs haven't always kept up.

I've been asked if I can find a way via SQL to identify, as nearly as possible (some false positives/negatives OK), columns in the tables that are not referenced by either the views or the SQL queries that provide the API output.

Initially this seemed do-able. I've found some similar questions on the topic, such as here and here that sort of give guidance on how to start...although I note that even with these, there's the kind of ugly fallback method that looks like:

OBJECT_DEFINITION(OBJECT_ID([Schema].[View])) LIKE '%' + [Column] + '%'

Which is likely to generate false positives as well as be super slow when I'm trying to do it for a couple of thousand column names.

Isn't there anything better/more reliable? Maybe something that could compile a query down to a plan and be able to determine from the plan every column that must be accessed in order to deliver the results?

Jim Burnell
  • 948
  • 3
  • 9
  • 21
  • Is speed actually a consideration? Won't you only need to run this from time to time? – Dale K Sep 13 '20 at 23:17
  • Speed is not a consideration, except that I kind of need it to get me results within say the 16 hours that our development database server is up in a single day – Jim Burnell Sep 13 '20 at 23:18
  • The query you mention doesn't take that long does it? I would image a few minutes tops. – Dale K Sep 13 '20 at 23:19
  • My first attempt to search for one column had taken more than 3 minutes when I stopped it. I need to search for a couple of thousand columns. Also, column names are often not unique from table to table, so I'd ideally like to be able to distinguish Schema.Table1.TheColumn from Schema.Table2.TheColumn. I know it's asking for a lot. – Jim Burnell Sep 13 '20 at 23:23

1 Answers1

1

Our application also offers APIs implemented as queries stored in NVARCHAR(MAX) fields

So you've reimplemented views? :)

If you make them actual views you can look at INFORMATION_SCHEMA - cross reference table/columns to view/columns.

Assuming you don't want to do that, and you're prepared to write a job to run occasionally (rather than real-time) you could do some super-cheesy dynamic SQL.

  1. Loop through your definitions that are stored in NVARCHAR(MAX) with a cursor
  2. Create a temp view or SP from the SQL in your NVARCHAR(MAX)
  3. Examine INFORMATION_SCHEMA from your temp view/SP and put that into a temp holding table.
  4. Do this for all your queries then you've got a list of referenced columns

Pretty ugly but should be workable for a tactical scan of your API vs database.

LoztInSpace
  • 5,584
  • 1
  • 15
  • 27
  • Thanks for pointing me this direction: I wasn't aware of INFORMATION_SCHEMA.VIEW_COLUMN_USAGE. I think it may get me enough of a "good enough" to get this off my plate! – Jim Burnell Sep 14 '20 at 14:57
  • Also...these queries take input parameters/temp variables, often populate temp tables, and return multiple result sets, and we've needed to be able to update them without making DDL changes, so... kind of re-implementing views, kind of not :) – Jim Burnell Sep 14 '20 at 15:06
  • Ah, so reinventing stored procedures as well as views :) anyway, glad you found the answer somewhat useful. – LoztInSpace Sep 14 '20 at 22:13
  • I did. I didn't know that COLUMN_USAGE existed. I ended up doing a hybrid solution where I filtered out all the fields accessed by the views and then did text search for each triplet (schema, table, column) against the nvarchar(max) fields. I was then able to declare victory and move on. I'll choose your answer as the winner; thank you! – Jim Burnell Sep 15 '20 at 16:52