0

I'm trying to convert a string into a hierarchical list where each line in the string represents a single item within the hierarchy.

For example say I have the following string:

1 - First Level 1
2 - First Level 1 > Second Level 1
3 - First Level 1 > Second Level 1 > Third Level 1
4 - First Level 1 > Second Level 2
5 - First Level 2
6 - First Level 2 > Second Level 1
7 - First Level 2 > Second Level 1 > Third Level 1
...

I need to convert it to a list of the following type:

public class Category {
    public int Id { get; set; }
    public string Name { get; set; }
    public Category Parent { get; set; }
}

A category name cannot include the - or > characters.

E.g. the following line:

3 - First Level 1 > Second Level 1 > Third Level 1

Would add a category to the list with an id of 3, a name of "Third Level 1" and the Parent would point to the category where the name is "Second Level 1" (id = 2 in the example above and not id = 6).

Please note there might be multiple categories with the same name therefore it would need to lookup the whole path to get the parent.

So far I have managed to split the string per line and then for each line I do a another split against the hyphen to get the id and full category name. I can then do a further split against the greater than symbol to retrieve the category parts. I take the last part to get the category name and if there is more than one part I know I need to lookup the parent.

This is where I get lost as I now need to use the remaining parts to work out the parent taking into my consideration above that multiple categories may have the same name

I'd appreciate it if someone could show me how this can be done. Thanks

nfplee
  • 7,643
  • 12
  • 63
  • 124

2 Answers2

2

If I understood your problem statement correctly, this code should work

var strings = File.ReadAllLines(@"C:\YourDirectory\categories.txt");

var categories = new List<Category>();

foreach (var line in strings)
{
    var category = new Category(); //line = 3 - First Level 1 -> Second Level 1 -> Third Level 1
    var cats = line.Split('>').ToList(); //3 - First Level 1, Second Level 1, Third Level 1
    category.Id = int.Parse(cats.First().Split('-').First().Trim()); //3

    if (cats.Count > 1)
    {
        category.Name = cats.Last().Trim(); //Third Level 1
        var parentStr = cats.ElementAt(cats.Count - 2).Trim();
        if (parentStr.Contains('-'))
            parentStr = parentStr.Split('-').Last().Trim();
        category.Parent = categories.FirstOrDefault(c => c.Name == parentStr);
    }
    else
        category.Name = cats.First().Split('-').Last().Trim(); //for 1 - First Level 1

    categories.Add(category);
}

Update

After clarification, this is the changed code

var lines = File.ReadAllLines(@"C:\YourDirectory\categories.txt");
var lookup = new List<KeyValuePair<List<string>, Category>>(); //key = parents in order

foreach (var line in lines)
{
    var category = new Category (); //line = 3 - First Level 1 -> Second Level 1 -> Third Level 1
    var parts = line.Split('>').ToList(); //3 - First Level 1, Second Level 1, Third Level 1
    category.Id = int.Parse(parts.First().Split('-').First().Trim()); //3

    if (parts.Count > 1) //has parent
    {
        category.Name = parts.Last().Trim(); //Third Level 1
        if (parts.Count == 2) //has one level parent
        {
            var parentStr = parts.First().Split('-').Last().Trim();
            if (lookup.Any(l => l.Value.Parent == null && l.Value.Name == parentStr))
            {
                var parent = lookup.First(l => l.Value.Parent == null && l.Value.Name == parentStr);
                category.Parent = parent.Value;
                lookup.Add(new KeyValuePair<List<string>,Category>(new List<string> { parent.Value.Name }, category));
            }
        }
        else //has multi level parent
        {
            var higherAncestors = parts.Take(parts.Count - 2).Select(a => a.Split('-').Last().Trim()).ToList(); //.GetRange(1, parts.Count - 2).Select(a => a.Trim()).ToList();
            var parentStr = parts.Skip(parts.Count - 2).First().Trim();
            if (lookup.Any(l => MatchAncestors(l.Key, higherAncestors) && l.Value.Name == parentStr))
            {
                var parent = lookup.First(l => MatchAncestors(l.Key, higherAncestors) && l.Value.Name == parentStr);
                category.Parent = parent.Value;
                var ancestors = parent.Key.ToList();
                ancestors.Add(parent.Value.Name);
                lookup.Add(new KeyValuePair<List<string>, Category>(ancestors, category));
            }
        }
    }
    else //no parent
    {
        category.Name = parts.First().Split('-').Last().Trim(); //for 1 - First Level 1
        lookup.Add(new KeyValuePair<List<string>,Category> (new List<string>(), category));
    }
}

var categories = lookup.Select(l => l.Value); //THIS IS YOUR RESULT

private static bool MatchAncestors(List<string> ancestors1, List<string> ancestors2)
{
    if (ancestors1.Count != ancestors2.Count)
        return false;
    for (int i = 0; i < ancestors1.Count; i++)
    {
        if (ancestors1[i] != ancestors2[i])
            return false;
    }
    return true;
}

For this test data:

1 - First Level 1
2 - First Level 1 > Second Level 1
3 - First Level 1 > Second Level 1 > Third Level 1
4 - First Level 1 > Second Level 2
5 - First Level 2
6 - First Level 2 > Second Level 1
7 - First Level 2 > Second Level 1 > Third Level 1
8 - First Level 2 > Second Level 1 > Third Level 1 > Fourth Level 1
9 - First Level 1 > Second Level 1 > Third Level 1 > Fourth Level 2

This is the lookup value (as json):

[
  {
    "Key": [],
    "Value": {
      "Id": 1,
      "Name": "First Level 1",
      "Parent": null
    }
  },
  {
    "Key": ["First Level 1"],
    "Value": {
      "Id": 2,
      "Name": "Second Level 1",
      "Parent": {
        "Id": 1,
        "Name": "First Level 1",
        "Parent": null
      }
    }
  },
  {
    "Key": ["First Level 1","Second Level 1"],
    "Value": {
      "Id": 3,
      "Name": "Third Level 1",
      "Parent": {
        "Id": 2,
        "Name": "Second Level 1",
        "Parent": {
          "Id": 1,
          "Name": "First Level 1",
          "Parent": null
        }
      }
    }
  },
  {
    "Key": ["First Level 1"],
    "Value": {
      "Id": 4,
      "Name": "Second Level 2",
      "Parent": {
        "Id": 1,
        "Name": "First Level 1",
        "Parent": null
      }
    }
  },
  {
    "Key": [],
    "Value": {
      "Id": 5,
      "Name": "First Level 2",
      "Parent": null
    }
  },
  {
    "Key": ["First Level 2"],
    "Value": {
      "Id": 6,
      "Name": "Second Level 1",
      "Parent": {
        "Id": 5,
        "Name": "First Level 2",
        "Parent": null
      }
    }
  },
  {
    "Key": ["First Level 2","Second Level 1"],
    "Value": {
      "Id": 7,
      "Name": "Third Level 1",
      "Parent": {
        "Id": 6,
        "Name": "Second Level 1",
        "Parent": {
          "Id": 5,
          "Name": "First Level 2",
          "Parent": null
        }
      }
    }
  },
  {
    "Key": ["First Level 2","Second Level 1","Third Level 1"],
    "Value": {
      "Id": 8,
      "Name": "Fourth Level 1",
      "Parent": {
        "Id": 7,
        "Name": "Third Level 1",
        "Parent": {
          "Id": 6,
          "Name": "Second Level 1",
          "Parent": {
            "Id": 5,
            "Name": "First Level 2",
            "Parent": null
          }
        }
      }
    }
  },
  {
    "Key": ["First Level 1","Second Level 1","Third Level 1"],
    "Value": {
      "Id": 9,
      "Name": "Fourth Level 2",
      "Parent": {
        "Id": 3,
        "Name": "Third Level 1",
        "Parent": {
          "Id": 2,
          "Name": "Second Level 1",
          "Parent": {
            "Id": 1,
            "Name": "First Level 1",
            "Parent": null
          }
        }
      }
    }
  }
]
Arghya C
  • 9,805
  • 2
  • 47
  • 66
  • Thanks but then after further tests I found it selects the wrong parent if I have duplicate names. I have added another line to my question which you can use to see your example failing. – nfplee Sep 25 '15 at 12:42
  • @nfplee your categories can have only three levels or they can have any number of levels? like 8 - First Level 2 > Second Level 1 > Third Level 1 > Fouth Level 3 > Fifth Level 2 > Sixth Level 3 ... ? – Arghya C Sep 25 '15 at 12:48
  • Unlimited levels. At the moment I'm playing with the idea from @Oliver where I will use a dictionary key for the full name which makes the lookup a lot easier. – nfplee Sep 25 '15 at 13:44
  • Thanks this is working better. I've accepted @Oliver's answer as his was working first but I have taken bits from both in my final solution. However I will up vote your answer. – nfplee Sep 28 '15 at 08:01
2

Cause I like it more, i made your class immutable:

public class Category
{
    public int Id { get; private set; }
    public string Name { get; private set; }
    public Category Parent { get; private set; }

    public Category(int id, string name, Category parent)
    {
        Id = id;
        Name = name;
        Parent = parent;
    }

    public override string ToString()
    {
        return Id + " " + Name
            + (Parent == null ? String.Empty : (Environment.NewLine + "   Parent: " + Parent));
    }
}

And by using this code I got a flat list of all available categories where each category gets a reference to its parent:

var categories = new Dictionary<String, Category>(StringComparer.InvariantCultureIgnoreCase);

using (var reader = new StringReader(_SampleData))
{
    string line;

    while ((line = reader.ReadLine()) != null)
    {
        if (String.IsNullOrWhiteSpace(line))
            continue;

        var elements = line.Split('-');
        var id = int.Parse(elements[0]);
        var name = elements[1].Trim();
        var index = name.LastIndexOf('>');
        Category parent = null;

        if (index >= 0)
        {
            var parentName = name.Substring(0, index).Trim();
            categories.TryGetValue(parentName, out parent);
        }

        var category = new Category(id, name, parent);
        categories.Add(category.Name, category);
    }
}

Just for visualization call:

foreach (var item in categories.Values)
{
    Console.WriteLine(item);
}

And the output would be:

1 First Level 1
2 First Level 1 > Second Level 1
   Parent: 1 First Level 1
3 First Level 1 > Second Level 1 > Third Level 1
   Parent: 2 First Level 1 > Second Level 1
   Parent: 1 First Level 1
4 First Level 1 > Second Level 2
   Parent: 1 First Level 1
5 First Level 2
6 First Level 2 > Second Level 1
   Parent: 5 First Level 2
7 First Level 2 > Second Level 1 > Third Level 1
   Parent: 6 First Level 2 > Second Level 1
   Parent: 5 First Level 2
Oliver
  • 43,366
  • 8
  • 94
  • 151
  • Thanks, I was able to use this to get me going. I did have to make a mod so that the dictionary key and category name didn't have the same value. I don't need the category name to have the full path. Other than that it worked a treat. – nfplee Sep 28 '15 at 07:59
  • @nfplee: Then you should upvote it and/or mark it as the correct answer. Glad I could help. – Oliver Sep 28 '15 at 08:00
  • I was just about to. I just had to add a comment to @Arghya C to explain why I accepted your answer. Thanks again. – nfplee Sep 28 '15 at 08:04