0

Following scenario:

main-table having millions of rows (around 40M++)

related table having about 2M rows. (lets call it table_b)

We are currently migrating from Informix DB to MS-SQL, and I thought, that maybe I can optimize the data-schema somewhat...

In the old DB, there is a trigger which copies the contents of a column ("name") to the "maintable" (maintable.table_b_name) to optimize the filtering on "tableb.name".

I did some tests in the new DB, and it turns out, that in some scenarios (select * from maintable) filtering via "exists (select 'x' from tableb where...) is just as fast as filtering on the copied column.

In other scenarios however, the filtering on the copied column is a lot faster (twice as fast) (to be more specific: "select a.column, b.column from maintable a inner join sometable on [SOMECONDITION]")

So - now my question:

  • would filtering on a copied column always be faster, or are we just missing some indices? (Execution Plan does not hint to any missing ones though).
  • why would the "select * from maintable" be about the same speed when filtering via "exists (subselect)" vs. filtering on the copied column? But be a LOT slower when there is a join involved?

I would very much like to eliminate the copy...cause...40M * redundant data = lots of wasted space....(which isnt good from a performance perspective I suppose).

  • 1
    proper indexes and queries are the best route. ensure that your where clause references fields in the order of the index to guarantee that the index will be used. You can also run sql query analyzer to determine if things would run better with a new index-- it will give you the statement to run to build the best index for that query. It is still up to you and your data team to determine if that's the best change that can be made (e.g., could the query be more efficient without throwing another index at it?). – ps2goat Nov 18 '16 at 16:13
  • So you say redundant (copied) data is a big code-smell and shouldn't be done at all cost? – johannes.colmsee Nov 18 '16 at 16:26
  • no. i'm sure we don't have all the facts or a reproducible environment (you can normally create one on sqlfiddle.com, but that could still be broken for your target environment http://stackoverflow.com/questions/35735405/is-sqlfiddle-broken-errors-for-oracle-sql-server). what i mean to say is that regardless of what answers you get on here, or from SQL Query Analyzer, you will need to run your own tests against your own data, as well as use your own judgement for what will work best in your scenario. – ps2goat Nov 18 '16 at 19:03
  • You could choose to move archivable data into a separate table so live queries run faster (assuming you have more old data than new). – ps2goat Nov 18 '16 at 19:04
  • Thanks for all the valuable input. I think the real culprit in my performance problems is the relation between the 40M and 30M tables. The relation is done by 5 fields one of which is even linked via an or condition (tablea.column = tableb.column_x1 or tablea.column = tableb.column_x2) Instead what you usually would have - a "link table". I cannot fix this unfortunately - don't know the structure well enough (it's my co-workers code), or why there would even be a n-m relation (when one would suspect a 1-n relation judging from the names of the tables). – johannes.colmsee Nov 20 '16 at 12:49

0 Answers0