2

I am trying to do some string maniplation for a product import, unfortunely I have some duplicate data, which if left in would assign products to categories that I don't want products assigned to.

I have the following string :

Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2

The outcome I would like be:

Category A>Sub Category 1

Category B

Category C>Sub Category 2

First I split on the (|) which gives me:

Category A

Category A > Sub Category 1

Category B

Category C

Category C > Sub Category 2

I was then loop through this list and spilt on the (>)

But I don't know how to merge the results for example Category A\ Sub Category 1

Below is the code. This will be used to process approx 1200 rows, so I am trying to make it has quick as possible.

    static void Main(string[] args)
    {
        string strProductCategories = "Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2";

        List<string> firstSplitResults = strProductCategories.SplitAndTrim('|');

        List<List<string>> secondSplitResults = new List<List<string>>();

        foreach( string firstSplitResult in firstSplitResults )
        {
            List<string> d = firstSplitResult.SplitAndTrim('>');
            secondSplitResults.Add(d);
        }

       // PrintResults(firstSplitResults);
        PrintResults2(secondSplitResults);
    }

    public static void PrintResults(List<string> results)
    {
        foreach( string value in results)
        {
            Console.WriteLine(value);
        }
    }

    public static void PrintResults2(List<List<string>> results)
    {
        foreach(List<string> parent in results)
        {
            foreach (string value in parent)
            {
                Console.Write(value);
            }

            Console.WriteLine(".....");
        }


    }
}

public static class StringExtensions
{
    public static List<string> SplitAndTrim(this string value, char delimter)
    {
        if( string.IsNullOrWhiteSpace( value))
        {
            return null;
        }

        return value.Split(delimter).Select(i => i.Trim()).ToList();
    }
}

Once I have got the list correct I will rejoin the list with the (\).

Any help would be very useful.

UPDATE

The data is coming from a CSV so it could have n number of levels.

So for example :

Category A -> THIS IS DATA IS REDUNDANT

Category A > Sub Category 1 -> THIS IS DATA IS REDUNDANT

Category A > Sub Category 1 > Sub Sub Category 1

Category A > Sub Category 1 > Sub Sub Category 2

Would result in :

Category A > Sub Category 1 > Sub Sub Category 1

Category A > Sub Category 1 > Sub Sub Category 2

Simon

user9252
  • 123
  • 4
  • I just tested your code using the `Split(), First(), and Join()` I will post a working solution below in one sec – MethodMan Nov 20 '15 at 19:28
  • I don't understand your criteria for what should be included in the output. Why should "Category B" be included but not "Category A", or "Category C"? – Bradley Uffner Nov 20 '15 at 19:31
  • @user9252 look at the answer I posted it's pretty straight forward and it `yields` the expected results – MethodMan Nov 20 '15 at 20:55
  • @user9252 will the format ever change..? if so please let us know so that I can undelete my answer that I have posted. – MethodMan Nov 20 '15 at 21:03
  • @user9252, Nosik's answer is extremely clever; however, see mine which uses linq, which can be a little bit faster than string manipulation (and is in these tests), and can be easier to understand and maintain than string manipulation, because the semantics are on a higher level. – toddmo Nov 20 '15 at 23:42

4 Answers4

0

You have a good start, basically you just need to add some code at the end to complete the solution.

foreach( List<string> i in secondSplitResults )
{
     if (i.Count == 2)
     {
        i.RemoveAll(x => x.Count == 1 && x[0] == i[0]);
        i.Insert(1,"/");
    }
}

PrintResults2(secondSplitResults);
Prometheus
  • 56
  • 6
  • I've tried your suggestion I could not get your answer to compile. Would your answer cater for n number of subcategories – user9252 Nov 20 '15 at 20:48
  • @Prometheus what is this that you posted..? can you explain I am not sure what this is but it's definitely not an answer.. can you test this out and tell me if it really works.. – MethodMan Nov 20 '15 at 21:02
  • you need to include linq.using System.Linq; However you added more to the question, you'll have to allow me to modify the answer for more than 2 levels of hierarchy. – Prometheus Nov 20 '15 at 22:31
0

If leaf elements you marked as "redundant" are removed the problem can be reduced to finding the longest path among items with common prefix:

class Program
{
    static void Main(string[] args)
    {
        string pathCase1 = "Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2";
        string pathCase2 = "Category A -> THIS IS DATA IS REDUNDANT|Category A > Sub Category 1 -> THIS IS DATA IS REDUNDANT|Category A > Sub Category 1 > Sub Sub Category 1|Category A > Sub Category 1 > Sub Sub Category 2";
        PrintPaths("case1", ParsePaths(pathCase1));
        PrintPaths("case2", ParsePaths(pathCase2));

        Console.ReadLine();
    }

     private static void PrintPaths(string name, List<string> paths)
     {

         Console.WriteLine(name);
         Console.WriteLine();

         foreach (var item in paths)
         {
             Console.WriteLine(item);
         }

         Console.WriteLine();
     }



    static string NormalizePath(string src)
    {
        // Remove "-> THIS DATA IS REDUNDANT" elements

        int idx = src.LastIndexOf('>');
        if (idx > 0 && src[idx - 1] == '-')
        {
            src = src.Substring(0, idx - 1);
        }

        var parts = src.SplitAndTrim('>');
        return string.Join(">", parts);
    }


     static List<string> ParsePaths(string text)
     {
         var items = text.SplitAndTrim('|');
         for (int i = 0; i < items.Count; ++i)
         {
             items[i] = NormalizePath(items[i]);
         }

         items.Sort();

         var longestPaths = new SortedSet<string>();

         foreach (var s in items)
         {
             int idx = s.LastIndexOf('>');
             if (idx > 0)
             {
                 var prefix = s.Substring(0, idx);
                 longestPaths.Remove(prefix);
             }

             longestPaths.Add(s);
         }

         return longestPaths.ToList();
     }
}

Output:

case1

Category A>Sub Category 1
Category B
Category C>Sub Category 2

case2

Category A>Sub Category 1>Sub Sub Category 1
Category A>Sub Category 1>Sub Sub Category 2
alexm
  • 6,854
  • 20
  • 24
  • just tried your code, I think it is nearly there. If I put in a 'Category B' into the orginal string, as there is only one I would expect it in the answers – user9252 Nov 20 '15 at 22:19
  • @user9252 I typed first it without testing. Now it works correctly, that is of course if you are still interested... – alexm Nov 21 '15 at 00:39
  • @user9252: the answer you accepted does not return the expected output in the case #2 – alexm Nov 21 '15 at 00:51
0

I may have misunderstood the question, but maybe I did this in 2 lines of code:

https://dotnetfiddle.net/GyDwar

using System;
using System.Linq;
using System.Collections.Generic;

public class Program
{
    public static void Main()
    {
        foreach(var part in getParts("Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2"))
            Console.WriteLine(part);
        Console.WriteLine();

        Console.WriteLine("TEST 2");
        foreach(var part in getParts("Category A > THIS IS DATA IS REDUNDANT|Category A > Sub Category 1 > THIS IS DATA IS REDUNDANT|Category A > Sub Category 1 > Sub Sub Category 1|Category A > Sub Category 1 > Sub Sub Category 2"))
            Console.WriteLine(part);
    }

    public static List<string> getParts(string stringToParse){
        var parts = stringToParse.Split('|').Select(part => part.Trim());
        return parts.Where(part => !parts.Any(comparePart => part != comparePart && comparePart.StartsWith(part))).ToList();
    }
}

Result:

Category A > Sub Category 1
Category B
Category C > Sub Category 2

TEST 2
Category A > THIS IS DATA IS REDUNDANT
Category A > Sub Category 1 > THIS IS DATA IS REDUNDANT
Category A > Sub Category 1 > Sub Sub Category 1
Category A > Sub Category 1 > Sub Sub Category 2

I basically say take all the parts where it does not form the beginning of another part.

toddmo
  • 20,682
  • 14
  • 97
  • 107
0

After you split on the (|) go through this list and simply calculate occurrences of each list item string within a initial string. If item occurrences within a initial string greater then 1 you should remove this item. Resulting list will be what you need. Calculation occurrences of each list item string within a initial string I took here How would you count occurrences of a string within a string? as far looks it's fastest approach

    string strProductCategories = "Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2";

    List<string> firstSplitResults = strProductCategories.SplitAndTrim('|');

    for (int i = 0; i < firstSplitResults.Count; i++)
    {
        int occCount = (strProductCategories.Length - strProductCategories.Replace(firstSplitResults[i], "").Length) / firstSplitResults[i].Length;
        if (occCount > 1)
        {
            firstSplitResults.RemoveAt(i);
            i--;
        }
    }

    // print result
    for (int i = 0; i < firstSplitResults.Count; i++)
    {
        Console.WriteLine(firstSplitResults[i]);
    }
    Console.ReadLine();
Community
  • 1
  • 1
makison
  • 373
  • 2
  • 10