3

I am trying to partition a huge table in SQL. Its size is 15 TB with millions of records. The main idea is to save the coming data onto different disks by creating new filegroup where the current disk can't be expanded any more.

I created the needed of filegroups, files, partition schema and partition function, but when I created the clustered index it took too much space (more than 200GB) and was still running so I stopped it.

My question: is there a way to partition an existing huge table without creating a clustered index that is taking too much space? Or is there a way to save the new coming data to a different disk?

jarlh
  • 42,561
  • 8
  • 45
  • 63
Ala' Mohsen
  • 41
  • 1
  • 5
  • What is a Tyra? – Thom A Oct 21 '21 at 11:17
  • Of course,there is. It is called "partitioned views". Please take a look https://www.sqlshack.com/sql-server-partitioned-views/ – Sergey Oct 21 '21 at 11:18
  • The space won't be permanent. If you create a clustered index on a table that doesn't have one (or create one with drop_existing), SQL Server needs to build that _by making a copy of the data_. When it's done, the old one will go away. – Aaron Bertrand Oct 21 '21 at 11:45
  • @AaronBertrand so you are saying that i will need 15 TB space to create index then after it is created space will be freed up ? – Ala' Mohsen Oct 21 '21 at 11:48
  • @Sergey so partitioned view is same to normal SQL view but with constraint , right ? – Ala' Mohsen Oct 21 '21 at 11:49
  • Correct (well actually probably ~1.5x). But you can create the clustered index on a different filegroup (e.g. on the new filegroup). Better move your log file there too. – Aaron Bertrand Oct 21 '21 at 11:49

2 Answers2

7

To avoid the pain of creating a 15TB index (which you could create on a new filegroup on the new disks, of course), you can instead create a new partitioned table (or not partitioned, if you don't need to manage / archive / purge old data ever) from scratch, start writing all new incoming data there, and slowly move the data over.

Borrowing from my own answer here:

  • create a new partitioned table on the new disk with enough partitions to happily divide up all the existing and some-reasonable-time-into-the-future future data
  • create a view that union alls the two tables (may be simplest to rename the current table and replace it, temporarily, with a view having the old table name)
  • direct new writes to the new, partitioned table
    • hopefully you control writes via a stored procedure or minimal places where you change the target name
    • if not, you can use an instead of trigger on the view to direct writes to the partitioned table
  • in the background, start delete top (@batchsize) output deleted.* into new_table from old_table
    • this doesn't matter how long it takes to get everything done, the trick is to optimize the batch size so it's a balance of getting things done and not causing blocking for too long, and to make sure you put in some log backups between every n batches if they're not already scheduled frequently enough (some more info on that here)
  • once all the data is backfilled, you can drop the old table, then change the view to no longer union all (or get rid of it and rename the new table)

If it takes two weeks to backfill all the old data into the new partition scheme, so what? Users aren't waiting for two weeks; they're only ever waiting on any individual batch (and even then, that'll largely just be the folks querying the data that's moving over, not the new data).

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • Thank you @Aaron , your solution is logical but for me I want a quick solution where I am not caring so much about performance where the DB is used for archiving purpose – Ala' Mohsen Oct 24 '21 at 19:47
  • Sure, but if you mean “quick to write” that will be at odds with “quick to execute” because of the problem you’ve already come up against. Creating a 15TB index is one shot is never going to be “quick” even though there are certainly easier ways to write “quick” ways to do it. – Aaron Bertrand Oct 24 '21 at 20:37
3

The main idea is to save the coming data onto different disks by creating new filegroup where the current disk can't be expanded any more.

Or is there a way to save the new coming data to a different disk?

Yes. Partitioning is gross overkill for this. Just add a new file to the existing filegroup and place the new file on a new disk. Almost all new allocations will come from the new file on the new disk, due to SQL Server's Proportional Fill Algorithm.

David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
  • 1
    It's true, partitioning is overkill _just to solve a full disk problem_, but the technique I proposed doesn't require the new table to be partitioned - that's just a common requirement when people suddenly realize oh, crap, my table is 15TB, now what? And how do I manage this data going forward? I would probably rather move the data in the long run than leave it straddling these two disks (filling one of them). Also, if the 15TB table is a heap, well, just adding a file isn't going to fix all of the downsides of that, either. :-) – Aaron Bertrand Oct 21 '21 at 13:16
  • @David Thank you so much this solution is perfect for me , works fine :) :) . I determined initial size of the primary file based on the disk size I have (auto growth is none ) and added a new file to primary filegroup to new disk . – Ala' Mohsen Oct 24 '21 at 19:48