4

I can not think of any reasons why we need to have multiple files inside a file group. The reason why I think of this way is we can control from T-SQL (end user) level about file group, but can not control from T-SQL (end user) level about individual files of a file group. Any comments or ideas why files are still needed?

thanks in advance, George

Nick Kavadias
  • 7,542
  • 2
  • 37
  • 46
George2
  • 44,761
  • 110
  • 317
  • 455

3 Answers3

5

Having multiple files per file group is only useful for the following reasons:

  1. Distributing disk I/O load over multiple disks for performance reasons. i.e. in cases where re-configuring the RAID configuration with additional disks is not possible, or there is no RAID.
  2. In cases where you have a VLDB and do not wish to deal with very large single files for logistical reasons.

There is 'urban legend' that SQL Server uses only 1 thread per file, so that the number of files should match the number of CPU's. This is however false, as discussed by Microsoft here.

Historically, there is another reason. Believe it or not in the days of SQL Server 4.2 through 7 sql server was sometimes installed on FAT32 file systems which had a 4 gig file limit. The ability to chain files together (in what we now call file groups) was a way to work around file system limitations and allow DBs larger than 4gigs on FAT based installs.

EBarr
  • 11,826
  • 7
  • 63
  • 85
Nick Kavadias
  • 7,542
  • 2
  • 37
  • 46
  • 1. "Distributing disk I/O load over multiple disks for performance reasons" -- I think you mean using multiple files is of better performance of using one file even if with the same number of disk in disk array, why? 2. "logistical reasons" -- what means logistical? Say some other words? – George2 Feb 23 '09 at 12:06
  • I read the document you recommended and like it very much! Cool! I want to know whether it is SQL Server product formal announcement or just some technical guy's peer thoughts? :-) – George2 Feb 23 '09 at 12:07
  • it has nothing to do with performance, but for example if your doing things like copying mdb files around, then copying a large 2TB file is much more difficult then copying 4 500GB files – Nick Kavadias Feb 24 '09 at 04:12
  • it's the closest to a formal announcement you'll get. The blog is from the microsoft's customer support team. They'd have accurate sources inside the SQL dev team, so I'd say it a reliable source. – Nick Kavadias Feb 24 '09 at 04:16
  • Thanks Nick, from T-SQL developer perspective, could we control from file level? Currenly I only know how to control from filegroup level. – George2 Feb 24 '09 at 14:26
  • Hi Nick Kavadias, I read the document you recommended, but it never mentions more files means better performance. From which statement do you think it claims multiple files inside a filegroup means better performance? – George2 Feb 24 '09 at 14:26
2

old thread, i know, but here is what makes sense to me: back in the day max file size in windows filesystem FAT32 was 2GB. If your database-file got bigger you were screwed (happened to me with a MS Access-Database once). Hence they allowed to define a max filesize (like: 2GB) and You could add more files. If your database grew and the max size got exceeded the next file got filled until that was full and so on. All those files can be addressed as one filegroup. You can define a tables data-location by choosing a filegroup, but you don't see in which file within that filegroup the tabledata will end up. All You know is that your tables data can end up in any of the files within the filegroup. By this "splitting" Your filesystem never sees a file larger than the max filesize (here: 2GB) although tables in Your Database can be many times larger. Today setting up multiple files can be useful to have large datafiles "chopped" into smaller pieces for filebased backup (ask your network admins what they want, because during backup writing a large (like 1TB) file into a partition takes long, even in fast RAID. All other writing operations would need to wait a long time. Shorter waiting intervals let high prioritized operations come to execution quicker). If You care for parallel access of the same table consider horizontal partitioning as in http://msdn.microsoft.com/en-us/library/ms188730%28v=sql.105%29.aspx. this allows to spread the data of a table over different harddisks, like "all sales of January on disk R:", "all sales of February on disk S:", without creating separate tables. During the procedure of partitioning of a table You can define which part shall go to what filegroup.

Der U
  • 3,192
  • 1
  • 12
  • 13
1

I could provide a long explanation but MSDN does a good job of it here. It may be that you specifically don't need to have more than one file in a file group, but that is not true of everybody.

colithium
  • 10,269
  • 5
  • 42
  • 57
  • Hi colithium, I read this document before and I also read the related SQL Server 2008 book online edition of this document. But it never mentions why we need more than one files in a filegroup. If I missed anything in the document which says specifically why more files are good, please let me know. – George2 Feb 23 '09 at 02:57
  • "distribute I/O across multiple drives" and "Some systems can improve their performance by controlling the placement of data and indexes onto specific disk drives. Filegroups can aid this process." – colithium Feb 23 '09 at 03:43
  • Agree fliegroup could improve performance. But my question is not about filegroup, my question is why we need to create multiple files inside a filegroup? As we can not control from T-SQL which file inside filegroup to use, I can not think of advantage and scenario of using multiple files. – George2 Feb 23 '09 at 04:45