2

I have a datatable that has a traverse tree structure on it, it has lots of columns but the ones that matter are as follows: |RepID|LeaderID|Depth|

I want to set the tree's depth as: RepID that has no LeaderID (null) should be depth 0 RepID that has as a leader the ones that have depth 0 should be depth 1 and so on

Up to now I made a recursive algorithm that is getting the job done, however since I'm iterating over 400,000 rows it's taking a lot of time.

Up to now my code is as follows:

public static DataTable SetTreeLevel(DataTable dtReps)
{
    //Getting the 1st level (reps without a leader)
    var first_level = from r in dtReps.AsEnumerable()
                      where (r.Field<int?>("LeaderID") == null || r.Field<int?>("LeaderID") == 0)
                      select r;

    //Setting the level for the reps obtained
    foreach (var row in first_level)
    {
        row["Depth"] = 1;
    }

    //Setting the next levels
    return setTreeLevelRecursive(dtReps, 2);
}

private static DataTable setTreeLevelRecursive(DataTable dtReps, int depth)
{
    //Getting reps of the last level (depth -1)
    var last_level = from r in dtReps.AsEnumerable()
                     where r.Field<int?>("Depth") == depth - 1
                        select r.Field<int?>("RepID");

    //List to improve performance
    List<int?> last_level_list = last_level.ToList<int?>();

    //Getting the next level reps (leader is on the last level list)
    var actual_level = from r in dtReps.AsEnumerable()
                      where last_level_list.Contains(r.Field<int?>("LeaderID"))
                      select r;

    //List to improve performance
    List<DataRow> actual_level_list = actual_level.ToList<DataRow>();

    foreach (DataRow row in actual_level_list)
    {
        row["Depth"] = depth;
    }

    //Validating if there are reps without depth
    if ((from r in dtReps.AsEnumerable()
         where r.Field<int?>("Depth") == null
         select r).Count() > 0)
    {
        //Asignando siguiente nivel
        setTreeLevelRecursive(dtReps, depth + 1);
    }

    //Regresando resultado
    return dtReps;
}

Edit: Using Servy's optimization I coded the following:

var lookup = dtReps.AsEnumerable().ToLookup(x => x.Field<int?>("LeaderID"));

            //First level
            var first_level = from r in dtReps.AsEnumerable()
                              where (r.Field<int?>("LeaderID") == null || r.Field<int?>("LeaderID") == 0)
                              select Tuple.Create(r.Field<int>("RepID"), 1);

            var rows = Traverse(first_level, node => lookup[node.Item1]
                .Select(row => Tuple.Create(row.Field<int>("RepID"), node.Item2 + 1))).ToList();

            foreach (var r in rows)
            {
                (from r_nivel in dtReps.AsEnumerable()
                 where r_nivel.Field<int>("RepID") == r.Item1
                 select r_nivel).FirstOrDefault()["Depth"] = r.Item2;
            }

But the foreach takes a lot of time Thanks!

Daniel Martinez
  • 397
  • 4
  • 20
  • Since the code you have now is getting the job done, I am assuming that your question is how to make the code more efficient? – Boluc Papuccuoglu Dec 18 '13 at 16:44
  • Yes, I updated the question's title. It takes more than 30minutes to complete this task! – Daniel Martinez Dec 18 '13 at 16:49
  • You asked a question about this just yesterday, and it solved this exact problem for you. – Servy Dec 18 '13 at 17:00
  • @Servy: Actually that was another problem. Your solution works perfectly and really fast when reading the tree (my need on yesterday's question). But when updating a table it takes more time than the recursive function (or maybe I didn't implement it right) – Daniel Martinez Dec 18 '13 at 17:11
  • @DanielMartinez What you are doing here is still traversing the tree. (You also reverted back to the traversal method you were using before, rather than the optimization I showed you how to use.) You're just doing something with the results of that traversal (namely, updating the item) rather than just returning the results. If you need to update the item instead, you can still do that. The easiest way would be to simply have the function I provided return the entire `DataRow`, along with the depth, and then you can simply `foreach` over that and set the row's value to the depth. – Servy Dec 18 '13 at 17:14
  • @Servy I'm using your optimization on another method and it works as a charm, it greatly improved performance. But, for this specific function, I used the same code you provided (returning the Tuple) and I foreach over the 400k tuples and updated the row, however it takes about 1.5 hours to cmplete the foreach – Daniel Martinez Dec 18 '13 at 17:25

1 Answers1

0

First, you can define a Rep object:

public class Rep
{
    public int RepID {get;set;}
    public int LeaderID {get;set;}
    public int Depth {get;set;}
}

Then you populate a List from your DataTable using:

List<Rep> Reps=dtReps.OfType<DataRow>().Select(c=>new Rep(){RepID=Convert.ToInt32(c["RepID"]),LeaderID=Convert.ToInt32(c["LeaderID"])).ToList();

Then you create a lookup table for each leader's reps by:

Dictionary<int,List<Rep>> LeaderLookup=Reps.GroupBy(c=>c.LeaderID).ToDictionary(c=>c.Key,c=>c.ToList());

Now you use LeaderLookup to recursively set Depths. This is very fast, I am using similar data with around 3,000 items, and it can be populate under a second.

To do that, you define this function:

private void RecursivelySetDepthOfChildren(int RepID,int CurrentDepth,Dictionary<int,<List<Rep>> LeaderLookup)
{
    int DepthOfChildren=CurrentDepth+1;
    foreach(Rep child in LeaderLookup[RepID])
    {
         child.Depth=DepthOfChildren;
         RecursivelySetDepthOfChildren(child.RepID,child.Depth,LeaderLookup);
    }
}

Then you invoke the function with:

RecursivelySetDepthOfChildren(0,0,LeaderLookup);

after the ToDictionary statement. After that call completes you will have a List with depths all set correctly. You can iterate through that to save it to the Database.

How long does that take?

Boluc Papuccuoglu
  • 2,318
  • 1
  • 14
  • 24
  • Rather than `GroupBy..ToDictionary` you can just use `ToLookup` and do it all in one go. – Servy Dec 18 '13 at 16:58
  • Please see my edit, I guess the problem might be when recursively setting depths. Is it right to use LINQ on the whole table to fing the correct row? – Daniel Martinez Dec 18 '13 at 17:53
  • @DanielMartinez, added code, please have a go. – Boluc Papuccuoglu Dec 18 '13 at 21:03
  • @BolucPapuccuoglu It takes the same time because the same iteration takes place, the problem isn't assigning the depth to a list or a lookup but to iterate on every row to update the datatable. I was wondering if there's a way to assign the depth directly to the datatable with the same speed? Updating every row recursively takes a lot – Daniel Martinez Dec 18 '13 at 22:14