7

I need to write an application that will take a list of files (some large, some small) and fit them onto DVDs (or CDs, or whatever) as efficiently as possible. The whole point of this application is to use up as much of the 1st disc before moving onto the 2nd disc, filling the 2nd disc up as much as possible before moving onto the 3rd disc, etc.

(Note: The application doesn't have to do the actual burning to the DVD, it just has to figure out the best possible fit).

I initially thought I had a good game-plan by generating a permutation of the files and then checking each combination to see what fits the best. (My request for help on this can be found HERE)

But the more files there are, the longer it takes... exponentially. So I wanted some of your opinions on how to best achieve this.

Any ideas? And, as always, C# code is always appreciated.

Community
  • 1
  • 1
Jason Thuli
  • 736
  • 2
  • 8
  • 23

9 Answers9

12

What you're facing is related to the knapsack problem. The linked wikipedia page has lots more information, including suggested ways of solving it.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 2
    Actually I think this problem is a simplified version of the knapsack problem: you only have the weight to consider, not the value, so it should be much easier to solve – Thomas Levesque Sep 29 '10 at 20:24
9

Simple algorithm:

  1. Sort the file list by file size
  2. Find the largest file smaller than the remaining free space on the DVD, and add it to the DVD.
  3. If the remaining DVD free space is smaller than any remaining files, start a new dvd.
  4. Repeat from 2.
dthorpe
  • 35,318
  • 5
  • 75
  • 119
  • 1
    Sorting won't work. Consider: 2 large files may fill up everything except 1 byte of the DVD (which is a great fit), but if there are 3 medium files that can fill it up perfectly, that is technically a better fit... and it works in the opposite direction too. I need to fill it up as completely as possible before moving on to the next disc. – Jason Thuli Sep 29 '10 at 20:23
  • 2
    @Jason Thuli But you need to ask yourself what is your goal; the maximum number of bytes on a DVD or the minimum number of DVD's? this formula satisfies the second condition but not the first (but do you really need the first?) – Scott Chamberlain Sep 29 '10 at 20:26
  • i was just going to add that very same thing as I tried to work it out on my own for fun... my first idea, which isn't likely the best way, is to ask yourself "is there 1 file that takes up all the space?" and then "how about 2?" and just keep going – Aaron Anodide Sep 29 '10 at 20:36
  • My concern on the 2nd bullet point. Finding the "largest file smaller than the remaining free space" might not fit as well as multiple smaller files. So, if I were to use the multiple smaller files, I would have less overall bytes remaining to fit onto disc #2. Now, in this example, if I were to use the largest fittable file, I would have more bytes remaining for disc #2, which means those extra bytes could possibly require a disc #3. Granted, there is a very, VERY slim chance of this, but I want to be as efficient as possible. – Jason Thuli Sep 29 '10 at 20:37
  • 1
    @Jason As you said in another answer comment, the math for the perfect solution is pretty heady. Mathematically perfect is great, but when simple gets you 95% of the way to perfect, how much value is in that last, most difficult 5%? Mathematically perfect has value when you're talking about mfg 10 million DVDs, but not when you're worried about eliminating one 50 cent DVD-R from a backup file set of 20 DVDs. How much effort is worth saving that 50 cents? – dthorpe Sep 29 '10 at 20:39
  • You also need to take into account the DVD file system overhead per file, block sizes, etc. If you have 30MB left, you probably can't put 3 10MB files in there because there will be some bytes lost to the file directory entries and block size remainders. – dthorpe Sep 29 '10 at 20:43
  • dthorpe. I completely agree with you. And I'm sure I'll be using your approach until I can figure out that last 5%. I just wanted to see if anybody else out there can point me in the right direction. And good point about the DVD overhead, I never thought of that. – Jason Thuli Sep 29 '10 at 20:47
  • I'm going to retract my concern with this solution. After sitting down and drawing things out on paper, my concern doesn't make sense. This solution should work perfectly. Thanks! – Jason Thuli Sep 29 '10 at 21:07
  • Woops. typo in step 3. if the free space is SMALLER than any remaining file (not greater). Edited. – dthorpe Sep 29 '10 at 21:33
  • @Jason Thuli if you have a chance of identical files, IIRC DVDs optimize them, so that and the file system overhead are both things to keep in mind. – Rangoric Sep 30 '10 at 13:00
2

For anyone still interested in this question... I wrote a utility which I used for a similar purpose of fitting files into a set of disks/discs. It uses a command-line/file-based interface. Versions are available in C, C++, & Java (not C#).

http://whizman.com/code/diskfit.tgz

More detailed information is in the diskfit.tgz:Doc/diskfit.txt file.

(AGPL3)

We might characterize the question as 0-1 multiple-knapsack, or linear bin packing. (Thanks jon-skeet for the link about knapsack problem.)

Dthorpe solves linear bin packing, for exactly enough bins/disks to fit all files [nicely O(n) or O(n lg n) fast - also may be feasible in spreadsheet without having to write a script].

Basically, diskfit (above-linked utility) outputs qualifying file-sets based around 0-1 single-knapsack, and the user chooses one-disk file-sets to assemble into the disk-set - assisting the user (but not fully automating) toward both:

  • linear bin packing - for the complete disk set;
  • 0-1 multiple-knapsack - for each subset of disks 1..k of the full disk set (where files are prioritized, aka differ in value).

Full programmatic choice of the complete such disk-set, would be an additional feature. It would be insufficient to apply 0-1 single-knapsack solution, automatically disc by disc [greedily]. (Consider 3 knapsacks of capacity 6, and available items with equal value and weights of: {1, 1, 2, 2, 3, 4, 5}. Applying 0-1 knapsack to the first knapsack in isolation would choose {1, 1, 2, 2} to obtain sum value 4 - after which we cannot fit all of the remaining 3 items in the remaining second & third knapsacks - whereas we know we can fit all items in the 3 knapsacks as {1, 2, 3} & {1, 5} & {2, 4}.)

Randall Whitman
  • 411
  • 2
  • 13
1
for each file
 is there enough room this dvd?
   yes, store it here
   no, is there room on another already allocated dvd?
     yes, store it there
     no, allocate another dvd and store it there
Aaron Anodide
  • 16,906
  • 15
  • 62
  • 121
  • Thanks, but it's not quite as simple as that. Just because the first file CAN fit, doesn't mean it's part of the overall best possible combination of files. I need to check all of the files in groups to find the best fit. – Jason Thuli Sep 29 '10 at 20:15
  • That won't really solve the problem, that will only fill up a bunch of DVDs but it won't do the Tetris Math on it. – Rangoric Sep 29 '10 at 20:16
  • interesting... i figured i missed something... and i assume you don't get the answer by sorting them from large to small before each iteration, right? that was just my next thought... i love learning how much i don't know :) – Aaron Anodide Sep 29 '10 at 20:18
  • Rangoric, Tetris Math is a perfect description. Gabriel, I'm with you! I consider myself a decent developer, but this one is stumping me! Sorting won't work. Consider: 2 large files may fill up everything except 1 byte of the DVD (which is a great fit), but if there are 3 medium files that can fill it up perfectly, that is technically a better fit... and it works in the opposite direction too. – Jason Thuli Sep 29 '10 at 20:22
  • @Rangoric: interesting. what if after i'm done i start asking 'which n files can be replaced by one file?' and swap them, with increasing n? would that converge? i'll go read about it! – Aaron Anodide Sep 29 '10 at 20:23
  • Here's a thought: Maybe the fact that this problem isn't too easy is reflected by the fact that disks require degragging? I'm thinking of DVDs as drive cylinders... – Aaron Anodide Sep 29 '10 at 20:40
  • @Gabriel It would take quite a few rounds, as there is the possibility that you can't fit them into the theoretical smallest number of disks because the file sizes won't allow it. Even if you do it 1 disk at a time you can get good result, but it won't be a best fit for all disks. – Rangoric Sep 30 '10 at 12:58
1

While thats a cool problem to solve in a program for certain applications... however in your application, why not just use WinRAR or some other archiving program that has the capability to split up the archive into specific sized file chunks. You could make each chunk the size of a DVD and then just burn away.

EDIT: one issue you would run into is that if one of your files is greater than the size of your media, you are not going to be able to burn that file.

Jonathan S.
  • 541
  • 1
  • 3
  • 13
  • 1
    Thanks for the reply, but WinRAR kind of acts like HJSplit. It essentially makes 1 giant file, and splits that file into smaller segments. That means I wouldn't be able to view any of the contents without having access to the remainder of the giant file. In my problem, the files need to be free and accessable. – Jason Thuli Sep 29 '10 at 20:25
  • 1
    this is true... but what about the scenario where your file is larger than the size of the media? – Jonathan S. Sep 29 '10 at 21:16
0

use backtracking to get the optimal set of files to burn to dvd 1, then exclude them from the list and use backtracking on the remaining files to get the optimal fill for dvd 2 and so on

j0k
  • 22,600
  • 28
  • 79
  • 90
  • 1
    I imagine there are cases where an optimal set for DVD 1 would steal files from DVD 2 that could make an even better set there. Unless you're talking about going through every possible combination of files...in which case you're back to an only slightly smaller version of the original problem. – cHao Sep 22 '12 at 05:58
0

I've found a lot of tools that are supposed to solve this problem, but they all try to minimize the TOTAL number of disc used, while I was just interested into the SINGLE subset of files that best fit a SINGLE disc.

So i've ended writing my own tool called "ss" (from the "subset sum" algorithm which is based from). The tool is still buggy and can't recurse directories, but it's working for me. :)

eadmaster
  • 1,347
  • 13
  • 23
0

This problem is the Bin Packing Problem and is NP hard, which means if you want a truly optimal solution you will need exponential time. However there are methods that give less than optimal solutions but run much faster.

Assume we have an unbounded list of disks. Take each file ordered descending in size, then add each file to the first disk that it fits in. This is called First fit decreasing and takes 11/9 OPT + 6/9 disks in worst case. If you choose files in a random order you instead need 11/9 OPT + 1 disks.

There are algorithms that will pack things tighter, see the wikipedia link above for more details.

droid
  • 291
  • 2
  • 5
  • The packing problem isn't a good answer to this question because is a much more general problem than the bin packing problem. – droid Aug 06 '14 at 23:02
0

How about if you started by putting as many of the largest files you can onto one DVD and then filling it up with as many of the smallest files that you can (starting with the smallest).

Repeat this process with the remaining files for each disk.

I'm not sure that's going to give you perfect coverage/distribution but I think it might go some way to solving your needs.

Chris Schumann
  • 146
  • 1
  • 6