4

I have functioning code that splits the strings of one property in a list of class: Dataframe made of string, string, string.

Right now I am declaring an empty Dataframe2 (string,string[], string) and appending items to the list using Add

class Program

{


    public static string[] SPString(string text)
    {
        string[] elements;
        elements = text.Split(' ');
        return elements;
    }

    //Structures
    public class Dataframe
    {

        public string Name { get; set; }
        public string Text { get; set; }
        public string Cat { get; set; }
    }

    public class Dataframe2
    {

        public string Name { get; set; }
        public string[] Text { get; set; }
        public string Cat { get; set; }
    }



    static void Main(string[] args)
    {

        List<Dataframe> doc = new List<Dataframe>{new Dataframe { Name = "Doc1", Text = "The quick brown cat", Cat = ""},
            new Dataframe { Name = "Doc2", Text = "The big fat cat", Cat = "Two"},
            new Dataframe { Name = "Doc4", Text = "The quick brown rat", Cat = "One"},
            new Dataframe { Name = "Doc3", Text = "Its the cat in the hat", Cat = "Two"},
            new Dataframe { Name = "Doc5", Text = "Mice and rats eat seeds", Cat = "One"},
        };

        // Can this be made more efficient?
        ConcurrentBag<Dataframe2> doc2 = new ConcurrentBag<Dataframe2>();
        Parallel.ForEach(doc, entry =>
        {
            string s = entry.Text;
            string[] splitter = SPString(s);
            doc2.Add(new Dataframe2 {Name = entry.Name, Text = splitter, Cat =entry.Cat});
        } );

    }
}

Is there a more efficient way to add stuff to a list using a parallel LINQ where Dataframe2 inherits the properties I did not modify?

ccsv
  • 8,188
  • 12
  • 53
  • 97
  • 2
    I have a hard time understanding what you want to achieve. Also, don't use `List` with concurrency. It will have unexpected results. Use `ConcurrentBag` instead. – Patrick Hofman Jul 31 '15 at 08:07
  • @PatrickHofman I am trying to find out if there is a more efficient way to add stuff to the `list` other than `doc2.Add(new Dataframe2 {Name = entry.Name, Text = splitter, Cat =entry.Cat});` like one that just applies a mask or map stuff I do not use. Also not really familiar with `ConcurrentBag` but I am assuming it is a thread safe list? – ccsv Jul 31 '15 at 08:12
  • Indeed. It is thread-safe. `List` isn't. – Patrick Hofman Jul 31 '15 at 08:12
  • @PatrickHofman Ok I changed it to bags. Thanks I did not know since I just started working with parallel stuff – ccsv Jul 31 '15 at 08:16
  • What do you mean by *more efficient*? – Dzienny Jul 31 '15 at 08:17
  • @Dzienny Instead of looping over each entry is there methods that copy the entire columns that I do not use and map them on the new list? – ccsv Jul 31 '15 at 08:23
  • I very much suspect your code is too simple to benefit from asynchronicity, concurrency or parallelization. – Jodrell Jul 31 '15 at 08:31
  • @Jodrell I am not going to post the full code for obvious reason of space limitation also the parallel.foreach has a speed advantage see the example in the answer here http://stackoverflow.com/questions/12251874/using-a-parallel-foreach-loop-instead-of-a-regular-foreach – ccsv Jul 31 '15 at 08:32

1 Answers1

5

You can try using PLinq to add parallelism and preserve List<T> as well:

// Do NOT create and then fill the List<T> (which is not thread-safe) in parallel manually,
// Let PLinq do it for you
List<Dataframe2> doc2 = doc
  .AsParallel()
  .Select(entry => {
     //TODO: make Dataframe2 from given Dataframe (entry)
     ...
     return new Dataframe2 {Name = entry.Name, Text = splitter, Cat = entry.Cat};
  }) 
  .ToList();
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • I suspect this would actually be quicker without the, `.AsParallel()` but its less "clunky" than `Parallel.ForEach` – Jodrell Jul 31 '15 at 08:30
  • 2
    @Jodrell: it depends on size of `doc` list, evaluation cost of `Split` (i.e. of length of strings) etc. The best, IMHO, choice is *comment out* `AsParallel()` and see. – Dmitry Bychenko Jul 31 '15 at 08:34