2

I have a 30gb table, which has 30-40 columns. I create reports using this table and it causes performance problems. I just use 4-5 columns of this table for the reports. So that, I want to create a second table for the reports. But the second table must be updated when the original table is changed without using triggers.

No matter what my query is, When the query is executed, sql tries to cache all 30gb. When the cache is fully loaded, sql starts to use disk. Actually I want to aviod this

How can I do this? Is there a way of doing this using ssis thanks in advance

Barny
  • 383
  • 1
  • 3
  • 13
  • I think you're going about this the wrong way; if you're having performance issues with a query, then that leads me to suspect that it's either a poorly tuned query or you don't have the table indexed appropriately. – Stuart Ainsworth Jun 22 '12 at 13:07

4 Answers4

3
CREATE VIEW myView
AS
SELECT
  column1,
  column3,
  column4 * column7
FROM
  yourTable

A view is effectively just a stored query, like a macro. You can then select from that view as if it were a normal table.

Unless you go for matierialised views, it's not really a table, it's just a query. So it won't speed anything up, but it does encapsulate code and assist in controlling what data different users/logins can read.

MatBailie
  • 83,401
  • 18
  • 103
  • 137
  • I don't want to read the data in every query because it is about 30gb, I just want to make a small replica, but the data must be consistent – Barny Jun 22 '12 at 12:02
  • @Jabajubhyl - If you put `SELECT * FROM myView WHERE Column1 BETWEEN 10 AND 15` or any other WHERE clause, it only reads the data that you need. Just like a normal query. The fact that the view doesn't have a WHERE clause doesn't matter; the view's code is copied into whatever query uses the view, and so picks up the WHERE clause, JOIN predicates, etc from there. – MatBailie Jun 22 '12 at 12:05
  • but I will work with 30 gb data whenever I call the view, am I right? – Barny Jun 22 '12 at 12:10
  • @Jabajubhyl - It will be the same as working with the underlying table. This answer answers your question, but it appears that it doesn't address what you need. What *exactly* are you trying to achieve? Do you want to cache the results of a long winded SQL Query? This will only really be of benefit if you end up with fewer *rows* rather than fewer *columns* (Unless you're going from 4000 columns to 10 columns, and even then it won't be a *huge* saving). If it's caching results, show us the code you need caching and the underlying data's schema. If not, elaborate on exactly what you need. – MatBailie Jun 22 '12 at 12:31
  • I have an 30gb table, which has 30-40 columns. I create reports using this table and it causes performance problems. I just use 4-5 columns of this table for the reports. So that, I want to create a second table for the reports. But the second table must be updated when the original table is changed – Barny Jun 22 '12 at 12:54
  • Reducing the number of columns in the table wont make the query go faster, only reducing the number of rows. So answering the question you ask won't actually give you the result (inreased performance) you want – Bort Jun 22 '12 at 13:04
  • Are you filtering the data for your report, or using all of the rows? Are the columns you are NOT interested in varchar? What's your query look like? – Stuart Ainsworth Jun 22 '12 at 13:09
  • @Jabajubhyl - You are probably best to show us the table structure and the query/queries that you run. Then we can give options on how to improve performance. It may be indexes, it may be a refactor of the code, it may be a materialised view. But in this case your desired solution (less columns) is unlikely to make a significant difference for you. – MatBailie Jun 22 '12 at 13:25
0

Dems answer with the view seems ideal, but if you are truly looking for a new table, create it and have it automatically updated with triggers.

Triggers placed on the primary table can be added for all Insert, Update and Delete actions upon it. When the action happens, the trigger fires and can be used to do additional function... such as update your new secondary table. You will pull from the Inserted and Deleted tables (MSDN)

There are many great existing articles here on triggers: Article 1, Article 2, Google Search

Community
  • 1
  • 1
Ray K
  • 1,452
  • 10
  • 17
0

You can create that second table just like you're thinking, and use triggers to update table 2 whenever table 1 is updated.

However, triggers present performance problems of their own; the speed of your inserts and updates will suffer. I would recommend looking for more conventional alternatives to improve query performance, which sounds like SQL Server since you mentioned SSIS.

Since it's only 4-5 out of 30 columns, have you tried adding an index which covers the query? I'm not sure if there are even more columns in your WHERE clause, but you should try that first. A covering index would actually do exactly what you're describing, since the table would never need to be touched by the query. Of course, this does cost a little in terms of space and insert/update performance. There's always a tradeoff.

On top of that, I can't believe that you would need to pull a large percentage of rows for any given report out of a 30 gb table. It's simply too much data for a report to have. A filtered index can improve query performance even more by only indexing the rows that are most likely to be asked for. If you have a report which lists the results for the past calendar month, you could add a condition to only index the rows WHERE report_date > '5/1/2012' for example.

Jason
  • 1,325
  • 6
  • 14
  • Even If my query shows improved performance with a filter index, sql tries to cache all 30gb. When the cache is fully loaded, sql starts to use disk. Actually I want to aviod this. – Barny Jun 22 '12 at 13:25
0

If you are using SQL Server, what you want is an indexed view. Create a view using the column you want and then place an index on them.

An indexed view stores the data in the view. It should keep the view up-to-date with the underlying table, and it should reduce the I/O for reading the table. Note: this assumes that your 4-5 columns are much narrower than the overall table.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786