2

I was wondering whether someone may be able to help.

I have a text file with lines into that I have split on \t. They have two columns, a code and a name.

I was hoping to split this one dimensional structure into a parent child hierarchy on the code column.

Example of data being:

 0100000000     Coffee
 0110000000     Mocha
 0120000000     Cappuccino
 0121000000     Semi skimmed
 0121100000     Starbuckz
 0121200000     Costa
 0122000000     Skimmed
 0130000000     Latte

Human readable hierarchy:

 0100000000     Coffee
      0110000000     Mocha
      0120000000     Cappuccino
           0121000000     Semi skimmed
                0121100000     Starbuckz
                0121200000     Costa
           0122000000     Skimmed
      0130000000     Latte

I would like to convert this structure into a format like:

 public class LineData
 {
    public string OriginalCode { get; set; }
    public string Title { get; set; }
    public LineData Parent { get; set; }
    public List<LineData> Children { get; set; }
 }

The list is static and I will probably end up just storing in memory.

Phil
  • 2,315
  • 2
  • 18
  • 26
  • What have you come up with so far? Also, what hierarchy does the data represent? – Andrew Whitaker Dec 26 '13 at 15:39
  • I haven't started yet, I was wondering whether their is some best practice on the way to approach this. For example whether you would work from right to left or left to right, as the codes are padded with zeros to 8 characters. – Phil Dec 26 '13 at 15:41
  • 1
    I don't see the parent child relationship ? – Nix Dec 26 '13 at 15:43
  • i've edited to make it more visible – Phil Dec 26 '13 at 15:45
  • 1
    @Phil a recursive function that takes the list, depth, and starting point and loops until the flag at that depth changes would be a _general_ solution. – D Stanley Dec 26 '13 at 15:46
  • Wouldn't "Latte" be a child of "Coffee" or am I reading it wrong? – Andrew Whitaker Dec 26 '13 at 15:48
  • Latte is a child of coffee as if you stripped the trailing zeros coffee is 01, latte is 013 (anything that starts with 01 is a child of coffee), going from left to right that's the child structure. I am sure this must have some scientific/bona fide structure or name, but i'm just not sure what it is! :) – Phil Dec 26 '13 at 15:51
  • @Andrew - sorry! I forgot to pad in Latte on the example! – Phil Dec 26 '13 at 15:52
  • Do you have schema map? I mean 1st 2chars = category, 3rd = sub category, 4th = alt category... so will it be always single char for down levels? – cilerler Dec 26 '13 at 15:55
  • @Phil I've answered similar question but for php. Algorithm should be the same though. Unless I am missing something [this](http://stackoverflow.com/questions/19927176/transform-flat-array-to-tree-with-one-time-loop/19953819#19953819) should be helpful). – Leri Dec 26 '13 at 15:56
  • @cilerler - correct, first 2 chars for category, then every one character down is the level. Thanks Leri - great I'll have a look – Phil Dec 26 '13 at 15:58
  • What if a line has more than 9 children? – Oblivious Sage Dec 26 '13 at 16:00
  • @ObliviousSage - it doesn't - that would violate the schema ;) – Phil Dec 26 '13 at 16:01

2 Answers2

1

How about this?

var data = " 0100000000     Coffee\r\n 0110000000     Mocha\r\n 0120000000     Cappuccino\r\n 01210" +
    "00000     Semi skimmed\r\n 0121100000     Starbuckz\r\n 0121200000     Costa\r\n 01220" +
    "00000     Skimmed\r\n 0130000000     Latte";
var linesByPrefix = 
    (from l in data.Split(new[]{Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
    let pair = l.Split(new[]{' '},StringSplitOptions.RemoveEmptyEntries)
    select new LineData
    {
        OriginalCode = pair[0],
        Title = pair[1],
        Children = new List<LineData>()
    })
    .ToDictionary(l => l.OriginalCode.TrimEnd('0'));

foreach (var line in linesByPrefix)
{
    var parentCode = line.Key.Substring(0, line.Key.Length - 1);
    LineData parent;
    if(linesByPrefix.TryGetValue(parentCode, out parent))
    {
        line.Value.Parent = parent;
        parent.Children.Add(line.Value);
    }
}
var roots = linesByPrefix.Values.Where(l => l.Parent == null);
StriplingWarrior
  • 151,543
  • 27
  • 246
  • 315
  • 1
    Absolutely fantastic! Many thanks for your time on this. Happy holidays :) – Phil Dec 26 '13 at 16:23
  • 2
    tiny refactoring, **replace lineData & linesByPrefix with** `var linesByPrefix = data.Select(d => d.Split('\t')) .ToDictionary( td => td.First().TrimEnd('0'), td => new LineData { OriginalCode = td.First(), Title = td.Last(), Children = new List() });` – cilerler Dec 26 '13 at 17:17
1

Something like this could work:

var lines = File.ReadAllLines(@"...");
Stack<LineData> parents = new Stack<LineData>();
List<LineData> items = new List<LineData>();

foreach (string line in lines) 
{
    string[] parts = Regex.Split(line, @"\s+");

    string code = parts[0];
    string title = parts[1];

    LineData newItem = new LineData 
    { 
        OriginalCode = code,
        Title = title
    };

    LineData parent = null;

    // Find the parent, if any.
    while (parents.Any() && parent == null)
    {
        LineData temp = parents.Peek();

        if (code.Replace("0", string.Empty).Contains(
            temp.OriginalCode.Replace("0", string.Empty)))
        {
            parent = temp;
        }
        else
        {
            parents.Pop();
        }
    }

    if (parent != null)
    {
        parent.Children.Add(newItem);
    }
    else 
    {
        items.Add(newItem);
    }

    parents.Push(newItem);
}

Basically iterate through every line and keep a stack of ancestors that you continually Pop until you find the correct parent. I've defined "correct parent" as an ancestor with an OriginalCode that's contained in the current item's OriginalCode, minus the zeros.

Note that you'd also have to add a constructor for LineData that initializes Children.

Andrew Whitaker
  • 124,656
  • 32
  • 289
  • 307
  • Sorry Andrew - Stripling Warrior just beat you to it! I took some time to test his code and awarded the answer to him just moments before your posted. Thank you anyway – Phil Dec 26 '13 at 16:29
  • @Phil: No problem at all! – Andrew Whitaker Dec 26 '13 at 16:30
  • +1 for coming up with a better-performing solution. I thought about trying this stack-based approach, but opted for a more LINQ-oriented solution to avoid cyclomatic complexity and support disordered input. :-) – StriplingWarrior Dec 26 '13 at 16:39
  • 1
    @StriplingWarrior: Thanks-- yes this does assume an order to the input which is a downside. I also don't like the check for "Is this a parent." Seems to smell a bit to me. – Andrew Whitaker Dec 26 '13 at 17:23