3

Greetings, I'm trying to write a Linq query to run on a list of filenames which returns a list of files Grouped into 5MB chunks. So each group will contain a list of filenames whose total/summed MB is 5MB maximum.

I'm okay with Linq but this one I don't know where to begin. Help

DirectoryInfo di = new DirectoryInfo (@"x:\logs");
List<string> FileList = di.GetFiles ("*.xml")
var Grouped = FileList =>
BrokenGlass
  • 158,293
  • 28
  • 286
  • 335
ZionGates
  • 33
  • 3
  • Are there any more constraints? Like minimal count of groups? If yes, then LINQ is really bad idea, because such algoritm is too complex. If no. Then LINQ is only bad idea. LINQ should be used only when you imediately know how to write your query. If you need to think a lot about how to make your query, solving it iteratively would be much safer and quicker. – Euphoric Dec 30 '10 at 17:18

4 Answers4

1

Yeah, you can do this with LINQ.

var groupedFiles = files.Aggregate(
    new List<List<FileInfo>>(),
    (groups, file) => {
        List<FileInfo> group = groups.FirstOrDefault(
           g => g.Sum(f => f.Length) + file.Length <= 1024 * 1024 * 5
        );
        if (group == null) {
            group = new List<FileInfo>();
            groups.Add(group);
        }
        group.Add(file);
        return groups;
    }
);

This algorithm is greedy. It just finds the first list it can shove the FileInfo into without blowing past the upper bound of 5MB. It isn't optimal in terms of minimizing the number of groups but you didn't state that as a constraint. I think an OrderBy(f => f.Length) before the call to Aggregate would help but I don't really have time to think deeply about that right now.

jason
  • 236,483
  • 35
  • 423
  • 525
  • thanks for the response, could you show the imperative perspective please? – ZionGates Dec 30 '10 at 17:17
  • @ZionGates: Actually, I think the LINQ solution is kind of okay. The imperative solution wouldn't be much different. – jason Dec 30 '10 at 17:22
  • I think this solution is imperative one. You are only replacing foreach with Aggregate. Rest is normal imperative code. – Euphoric Dec 30 '10 at 17:27
  • @Euphoric: Agree, which is why I am okay with this solution and stated the imperative solution wouldn't be much different. – jason Dec 30 '10 at 17:28
  • This seems to be the most straightforward approach presented, thanks. – ZionGates Dec 30 '10 at 17:47
0

Look at this StackOverflow question to start with. It addresses grouping into sublists. The trick then is detecting the size of the files in the group by clause. This may be an answer where not using LINQ may be clearer than using it.

Part of the problem is that you have a list of file names. You need a list of the File objects so you can query the size of the file through LINQ. In Linq 4.0 you have a group-by-into construct that should be what you want.

Community
  • 1
  • 1
Berin Loritsch
  • 11,400
  • 4
  • 30
  • 57
0

Here's one way:

  1. Define a type that takes a file size as input and returns a value which increments as a specified max is reached and resets. (This type is responsible for maintaining its own state.)
  2. Group by the values returned by this type.

Code example:

// No idea what a better name for this would be...
class MaxAmountGrouper
{
    readonly int _max;

    int _id;
    int _current;

    public MaxAmountGrouper(int max)
    {
        _max = max;
    }

    public int GetGroupId(int amount)
    {
        _current += amount;
        if (_current >= _max)
        {
            _current = 0;
            return _id++;
        }

        return _id;
    }
}

Usage:

const int BytesPerMb = 1024 * 1024;

DirectoryInfo directory = new DirectoryInfo(@"x:\logs");
FileInfo[] files = directory.GetFiles("*.xml");

var grouper = new MaxAmountGrouper(5 * BytesPerMb);
var groups = files.GroupBy(f => grouper.GetGroupId((int)f.Length));

foreach (var g in groups)
{
    long totalSize = g.Sum(f => f.Length);
    Console.WriteLine("Group {0}: {1} MB", g.Key, totalSize / BytesPerMb);
    foreach (FileInfo f in g)
    {
        Console.WriteLine("File: {0} ({1} MB)", f.Name, f.Length / BytesPerMb);
    }
    Console.WriteLine();
}
Dan Tao
  • 125,917
  • 54
  • 300
  • 447
-1

I would first throw the file list to a SQL table. Something like this but with the size column included:

CREATE TABLE #DIR (fileName varchar(100))

INSERT INTO #DIR
EXEC master..xp_CmdShell 'DIR C:\RTHourly\*.xml /B'

Then it would be a select statements something like:

SELECT *,
CASE WHEN SIZE < 5 THEN 1
WHEN SIZE < 10 THEN 2
...
END AS Grouping
FROM #DIR
ORDER BY Grouping, FileName, Size

There is a security setting you have to change real quick on SQL Server to do this. See the blog posting HERE.

JBrooks
  • 9,901
  • 2
  • 28
  • 32