3

I'm looking to do a data transformation from a flat list into a hierarchical structure. How can I accomplish this in a readable way but still acceptable in performance and are there any .NET libraries I can take advantage of. I think this is considered a "facet" in certain terminologies (in this case by Industry).

public class Company
{        
    public int CompanyId { get; set; }
    public string CompanyName { get; set; }
    public Industry Industry { get; set; }
}

public class Industry
{
    public int IndustryId { get; set; }
    public string IndustryName { get; set; }
    public int? ParentIndustryId { get; set; }
    public Industry ParentIndustry { get; set; }
    public ICollection<Industry> ChildIndustries { get; set; }
}

Now let's say I have a List<Company> and I'm looking to transform it into a List<IndustryNode>

//Hierarchical data structure
public class IndustryNode
{
    public string IndustryName{ get; set; }
    public double Hits { get; set; }
    public IndustryNode[] ChildIndustryNodes{ get; set; }
}

So that the resulting object should look like this following after it is serialized:

{
    IndustryName: "Industry",
    ChildIndustryNodes: [
        {
            IndustryName: "Energy",
            ChildIndustryNodes: [
                {
                    IndustryName: "Energy Equipment & Services",
                    ChildIndustryNodes: [
                        { IndustryName: "Oil & Gas Drilling", Hits: 8 },
                        { IndustryName: "Oil & Gas Equipment & Services", Hits: 4 }
                    ]
                },
                {
                    IndustryName: "Oil & Gas",
                    ChildIndustryNodes: [
                        { IndustryName: "Integrated Oil & Gas", Hits: 13 },
                        { IndustryName: "Oil & Gas Exploration & Production", Hits: 5 },
                        { IndustryName: "Oil & Gas Refining & Marketing & Transporation", Hits: 22 }
                    ]
                }
            ]
        },
        {
            IndustryName: "Materials",
            ChildIndustryNodes: [
                {
                    IndustryName: "Chemicals",
                    ChildIndustryNodes: [
                        { IndustryName: "Commodity Chemicals", Hits: 24 },
                        { IndustryName: "Diversified Chemicals", Hits: 66 },
                        { IndustryName: "Fertilizers & Agricultural Chemicals", Hits: 22 },
                        { IndustryName: "Industrial Gases", Hits: 11 },
                        { IndustryName: "Specialty Chemicals", Hits: 43 }
                    ]
                }
            ]
        }
    ]
}

Where "Hits" are the number of companies that fall into that group.

To clarify, I need to transform a List<Company> into a List<IndustryNode> NOT serialize a List<IndustryNode>

parliament
  • 21,544
  • 38
  • 148
  • 238

4 Answers4

1

Try this:

    private static IEnumerable<Industry> GetAllIndustries(Industry ind)
    {
        yield return ind;
        foreach (var item in ind.ChildIndustries)
        {
            foreach (var inner in GetAllIndustries(item))
            {
                yield return inner;
            }
        }
    }

    private static IndustryNode[] GetChildIndustries(Industry i)
    {
        return i.ChildIndustries.Select(ii => new IndustryNode()
        {
            IndustryName = ii.IndustryName,
            Hits = counts[ii],
            ChildIndustryNodes = GetChildIndustries(ii)
        }).ToArray();
    }


    private static Dictionary<Industry, int> counts;
    static void Main(string[] args)
    {
        List<Company> companies = new List<Company>();
        //...
        var allIndustries = companies.SelectMany(c => GetAllIndustries(c.Industry)).ToList();
        HashSet<Industry> distinctInd = new HashSet<Industry>(allIndustries);
        counts = distinctInd.ToDictionary(e => e, e => allIndustries.Count(i => i == e));
        var listTop = distinctInd.Where(i => i.ParentIndustry == null)
                        .Select(i =>  new IndustryNode()
                                {
                                    ChildIndustryNodes = GetChildIndustries(i),
                                    Hits = counts[i],
                                    IndustryName = i.IndustryName
                                }
                        );
    }

untested

Ahmed KRAIEM
  • 10,267
  • 4
  • 30
  • 33
  • `distrinctInd.Where(i => i.ParentIndustry == null)` doesnt match any elements because the companies never reference any top level Industry elements. I've been trying to make it work otherwise but still am having much difficulty. – parliament Oct 15 '13 at 20:15
  • Try `distinctInd.Where(i => i.ChildIndustries == null || i.ChildIndustries.Count == 0)` – Ahmed KRAIEM Oct 16 '13 at 07:56
0

You are looking for a serializer. MSFT has one that is native to VS, but I like Newtonsofts, which is free. MSFT documentation and examples are here, Newtonsoft documentation is here.

Newtonsoft is free, easy and faster.

CodeChops
  • 1,980
  • 1
  • 20
  • 27
  • I really don't like someone giving me a minus one with no reason. If you don't have a reason, don't vote it down. – CodeChops Oct 15 '13 at 15:34
  • I didn't downvote but the answer is not helpful. I'll already be using JSON.NET to serialize but I still need to get it into the proper structure. – parliament Oct 15 '13 at 15:40
  • That wasn't clear in the original post (as evidenced in half the answers). It sounded like you were looking for performance. Sorry I misunderstood your question. I still think it's lousy to minus one anything and not explain your reason. – CodeChops Oct 15 '13 at 16:28
0

Try to use json serializer for this purpose. I see that you data structure is OK, this is just a matter of serialization.

var industryNodeInstance = LoadIndustryNodeInstance();

var json = new JavaScriptSerializer().Serialize(industryNodeInstance);

If you want to choose between serializers please see this: http://www.servicestack.net/benchmarks/#burningmonk-benchmarks

LoadIndustryNodeInstance method

  • Build List<Industry>

  • Convert IndustryTree = List<IndustryNode>

  • Implement Tree methods, such Traverse. Try to look at Tree data structure in C#

Community
  • 1
  • 1
Alexandr
  • 1,452
  • 2
  • 20
  • 42
0

Here is some psuedo code that might get you along the way. I create a map/dictionary index and populate it with the company list. Then we extract the top level nodes from the index. Note that there may be edge cases (For example, this index may need to be partially filled initially as it doesn't seem any of your companies ever reference the very top level nodes, so those will have to be filled in some other way).

Dictionary<String, IndustryNode> index = new Dictionary<String, IndustryNode>();

public void insert(Company company)
{ 
    if(index.ContainsKey(company.Industry.IndustryName))
    {
        index[company.Industry.IndustryName].hits++;
    }
    else
    {
        IndustryNode node = new IndustryNode(IndustryName=company.Industry, Hits=1);
        index[node.IndustryName] = node;
        if(index.ContainsKey(company.Industry.ParentIndustry.IndustryName))
        {
            index[company.Industry.ParentIndustry.IndustryName].ChildrenIndustries.Add(node);
        }
    }    
}

List<IndustryNode> topLevelNodes = index
    .Where(kvp => kvp.Item.ParentIndustry == null)
    .ToList(kvp => kvp.Item);
CookieOfFortune
  • 13,836
  • 8
  • 42
  • 58
  • This solution will not take into account children of children of an industry if it is not affected to a company. – Ahmed KRAIEM Oct 15 '13 at 16:13
  • @AhmedKRAIEM True, those would have to be inserted initially. – CookieOfFortune Oct 15 '13 at 16:14
  • Thanks for the answer, if this method took an Industry instead where could recursion be applied to handle the children of children case? – parliament Oct 15 '13 at 20:25
  • Can you explain further? The way the data is currently presented, recursion isn't immediately usable. Eg. You could search a tree via recursion but that won't be different from a linear search since there's no stated search order. – CookieOfFortune Oct 15 '13 at 21:32