0

I just wanted to ask if it is safe to design a file table that in future will hold about 5-15 million of 0.5-10mb max files?

Will NTFS handle it?

I had a problem once on old Windows Server 2008 R2 that when I had a folder with more than 2.5 million files, then creating a new file inside that folder took about 30 seconds.... getting file list took about 5 minutes. Is that a NTFS problem?

Can it be a problem for this? Or file stream/file tables will create subfolders itself to handle so many files etc?

Or disabling 8.3 naming convention is enough and it will work fine then?

Thanks and regards

d00lar
  • 802
  • 7
  • 25
  • Could you tell the % of files larger than 1 mb? – gotqn Jan 29 '20 at 11:08
  • >50% why you ask ? does it matter? – d00lar Jan 29 '20 at 11:09
  • https://stackoverflow.com/a/20592796/1080354 - because you can end up with some hybrid solution - storing small files in the database and large on the file stream or if all of them are small, to store them in the database – gotqn Jan 29 '20 at 11:11
  • ok thanks but assume that i want just this filestream / filetables use - will it handle situation that im pointing? – d00lar Jan 29 '20 at 11:17
  • Could you test on similar environment - you can generated 5 million dummy records and test it yourself? This is the most sure way. – gotqn Jan 29 '20 at 11:19
  • sure i can but i think everyone had this situation that something was working fine on dev / dummy test and didnt on production ? thatswhy this website exists so we can share expirience? – d00lar Jan 29 '20 at 11:21
  • Well, that's why you should build your environments via scripts, both dev and production. So that you know you're running the same configuration, before you move things from dev to production. Also, there's some [tips from Microsoft on when to use FILESTREAM and for what types of files](https://learn.microsoft.com/en-us/sql/relational-databases/blob/filestream-sql-server?view=sql-server-ver15#when-to-use-filestream). – Radu Gheorghiu Jan 29 '20 at 11:23
  • i saw this but this does not clarify my question - it is better performance if smaller files - ok but can realy large number of files kill this ? or number of files does not matter? – d00lar Jan 29 '20 at 11:28
  • [See also](https://stackoverflow.com/q/197162/4137916). File creation time might increase due to fragmentation of the directory index file, and regular maintenance by defragmenting that may be a good idea. Getting file lists is often delayed by tools sorting or otherwise preparing the lists for display, but that does not reflect an inherent problem in NTFS itself (a function like `FindFirstFile` should enumerate immediately, regardless of the number of files). Note that testing is never wrong -- from 2008 R2 to now NTFS has certainly undergone improvements as well. – Jeroen Mostert Jan 29 '20 at 13:03
  • See also [this](https://www.sqlskills.com/blogs/paul/filestream-directory-structure/) for background info on how `FILESTREAM` data is stored -- it's not all in one big directory, so if you are concerned about *possible* performance problems in advance, it makes sense to build your solution so it is at least prepared to distribute data across directories by having multiple tables/columns (unified with a view) or even multiple filegroups (you can't add files to a `FILESTREAM` FG, but you can have multiple FGs, each with their own root directory). – Jeroen Mostert Jan 29 '20 at 13:15
  • so itself it will not create any subdirrectory but if i will split this to lets say 10 fables with 1million files each and each table in seperate filegroupy so seperate dirrectorry it will be almost perfect? – d00lar Jan 29 '20 at 13:18
  • Unless you've actually already run tests (more recent than the ones you once did for an old Windows 2008 R2 server, without, I gather, SQL Server creating and accessing the files), I wouldn't make any assumptions on what setup is "almost perfect". There's no need to pre-emptively make things more complicated than necessary; it suffices to not make them so that change is extremely costly later. Replacing a table with a view is typically not difficult; replacing a view with a different view is even less difficult. – Jeroen Mostert Jan 29 '20 at 13:22

2 Answers2

0

Will NTFS handle it?

Yes. Just do not open file explorer. THAT - the program, not the operating system - can not handle that as well. Command line or server that do not try to load all files into a list work well.

TomTom
  • 61,059
  • 10
  • 88
  • 148
  • so this issue that i had in 1st post ' then creating a new file inside that folder took about 30 seconds' it was not related to nfts? i was creating this new file programaticly withoud making dir etc? – d00lar Jan 29 '20 at 12:00
0

In my experience, in short, yes, NTFS can handle it, but avoid exploring FILESTREAM directories (explorer can’t handle this volume of files, it’ll crash). Some white papers recommend the use of FileStream when file size is 256KB or larger, but the performance its evident in files larger than 1MB.

Here are some tricks recommended for best practices:

  • Disabling the indexing service (disable indexing on the NTFS volumes where FILESTREAM data is stored.)
  • Multiple datafiles for FileStreamfilegroup in separate volumes.
  • Configuring correct NTFS cluster size (64KB recommended)
  • Configuring antivirus (cant delete some file of FILESTREAM or your DB will be corrupted).
  • Disabling the Last AccessTime attribute.
  • Regular disk defragmentation.
  • Disabling short file names (8dot3)
  • Keep FILESTREAM data containers on a separate disk volume (mdf, ndf and log).

Right now, we're doing some tests to migrate our FileUpload database (8TB and growing with 25MM of records) from varbinary(max) to use FileTable. Our approach is to split a very large database in a database per year.

I would like know if you are currently working on this in production environment and know your experience.

You can find more info in a free ebook: Art of FileStream

ByAlaN
  • 11
  • 3