0

I am trying to create an xml by looping through List<TestRequest>. for better performance , i am trying Parallel.ForEach for looping as there are thousands of records in the list however i am not getting consistent data in xml, sometime there is a truncation in xml string while appending to string builder and sometimes there is data mismatch. below is the code

   public class Program
    {


        static void Main(string[] args)
        {
            List<TestRequest> ids = new List<TestRequest>();
            Random rnd = new Random();
            int id = rnd.Next(1, 12345);
            for (int i = 1; i < 1000; i++)
            {
                var data = new TestRequest();
                data.dataId = id;
                ids.Add(data);
            }

            var xmlData = GetIdsinXML(ids);

        }

        private static string GetIdsinXML(List<TestRequest> Ids)
        {
            var sb = new StringBuilder();
            sb.Append("<ROOT>");

            Parallel.ForEach(Ids, id =>
            {

                sb.Append("<Row");
                sb.Append(" ID='" + id.dataId + "'");
                sb.Append("></Row>");


            }
            );


            sb.Append("</ROOT>");
            return sb.ToString();
        }


    }

    public class TestRequest
    {
        public int dataId { get; set; }

    }

is this is the correct way of using Parallel.ForEach ?

Please help. Thanks!

Harshit
  • 149
  • 3
  • 14
  • 3
    You can't use `Parallel.ForEach` like that. `StringBuilder` is not thread-safe. – mjwills Jan 22 '19 at 11:04
  • Possible duplicate of [Is .NET's StringBuilder thread-safe](https://stackoverflow.com/questions/8831385/is-nets-stringbuilder-thread-safe) – mjwills Jan 22 '19 at 11:04
  • 1
    Not only is StringBuilder not thread-safe, but you have 3 sb.Append invocations per task, which is not atomic either... –  Jan 22 '19 at 11:08
  • 3
    Why are you using `Parallel.ForEach` in this case? Why not use .NET's serialization mechanisms? What you try here, apart from not working, will keep the entire XML string in memory *and* generate a temporary string for each row due to that concatenation. .NET's libraries write directly to streams and files instead – Panagiotis Kanavos Jan 22 '19 at 11:09
  • @PanagiotisKanavos..to improve performance as there are thousands of records.is there other way of doing it without using string builder? – Harshit Jan 22 '19 at 11:12
  • 1
    Its actually possible with Parallel foreach. you should use different string builders and merge them finally. – M.kazem Akhgary Jan 22 '19 at 11:12
  • Try avoid using a string as result type then. If you want to process your records in a parallel manner, don't use a string as the resulting data type, but some other data type that will allow you to process data in parallel. –  Jan 22 '19 at 11:13
  • 2
    @Harshit thousands of records is no data at all. Why do you need to improve performance? What isn't running fast enough? There's a reason `serialization` is called that way anyway. You can't write a text file at random, you need to write its elements in order. Have you tried .NET's serializers? LINQ to XML? Some other serializer? – Panagiotis Kanavos Jan 22 '19 at 11:14
  • How many is 1000s? 10, 20 100000, 1234987239482374. Does it matter if the id has 000 in front of it. we need some more information – TheGeneral Jan 22 '19 at 12:13
  • Definitely move the XML generation in `TestRequest` as a method, and use a parallel loop to collect all the rows. In the end, join the rows and append the body tags to finish up. – John Alexiou Jan 22 '19 at 12:34
  • 1
    Your code has a bug. Set the `id` _inside_ the loop, or else all the entries will have the same id. Better yet do it all in one statement `var data = new TestRequest() { dataId=rnd.Next(1, 12345) };` – John Alexiou Jan 22 '19 at 23:39

2 Answers2

2

Here is the easiest way to do what you want in a parallel way:

public class TestRequest
{
    public int dataId { get; set; }
    public string ToXml() => $"<row id=\"{dataId}\"/>";
}

class Program
{
    static void Main(string[] args)
    {
        const int n = 10000000;
        List<TestRequest> ids = new List<TestRequest>();
        Random rnd = new Random();
        for (int i = 1; i<n; i++)
        {
            var data = new TestRequest
            {
                dataId=rnd.Next(1, 12345)
            };
            ids.Add(data);
        }
        Stopwatch sw = Stopwatch.StartNew();
        var xml = GetIdsinXML(ids);
        sw.Stop();
        double time = sw.Elapsed.TotalSeconds;

        File.WriteAllText("result.xml", xml);
        Process.Start("result.xml");

        var output = $"Size={n} items, Time={time} sec, Speed={n/time/1000000} M/sec";

#if DEBUG
        Debug.WriteLine(output);
#else
        Console.WriteLine(output);
#endif

    }

    static string GetIdsinXML(List<TestRequest> requests)
    {
        // parallel query
        var list = requests.AsParallel().Select((item) => item.ToXml());
        // or sequential below:
        // var list = requests.Select((item) => item.ToXml());

        return $"<root>\r\n{string.Join("\r\n", list)}\r\n</root>";
    }
}

On my crappy computer, without the .AsParallel() statement, executing sequentially I get about 1,600,000 operations per second. With the parallel statement, this jumps to 2,100,000 operations per second.

NOTE: I have replaced SpringBuilder with the built-in method string.Join(string, IEnumerable list) which should be fairly optimized already by Microsoft. As an interesting side note, the Debug build is as fast if not even faster than the Release build. Go figure.

John Alexiou
  • 28,472
  • 11
  • 77
  • 133
  • totally overlooked that one. clever solution and much faster. interestingly on my rig with default degree of parallelism (8 threads) I get `2.9` seconds, with `2` degree of parallelism i get `1.3` seconds! without parallel i get `2.4` seconds. – M.kazem Akhgary Jan 23 '19 at 02:07
  • 1
    @M.kazemAkhgary - FYI, I am getting 6 and 4.5 seconds on a 4 core machine. The slower performance with higher parallelism is due to overhead, and that some of the cores must be busy doing other work. For this long and fairly fast calculation, it seems the most efficient is using only 2 cores. Each problem is different. – John Alexiou Jan 23 '19 at 02:41
1

With PLINQ you can do something like this.

I'm not sure if you would really need this. I just put it for sake of answer.

Your code doesn't work because StringBuilder is not thread safe and your operations are not atomic which means your code has race condition.

sb.Append("<Row");
sb.Append(" ID='" + id.dataId + "'");
sb.Append("></Row>");

For example one thread may execute line 1, the other thread just after that executes line 3. you will have <Row></Row>. this race condition happens all the time and final result is gibberish.

One way to fix this is to use different StringBuilders on different threads and finally append the result of those builders sequentially.

If running threads do very light tasks and finish quickly, doing things in parallel will only slow down your program.

return Ids.AsParallel()
   .Select((id, index) => (id, index))
   .GroupBy(x => x.index%Environment.ProcessorCount, x => x.id, (k, g) => g)
   .Select(g => 
   {
       var sb = new StringBuilder();
       foreach (var id in g)
       {
           sb.Append("<Row");
           sb.Append(" ID='" + id.dataId + "'");
           sb.Append("></Row>");
       }
       return sb.ToString();
   })
   .AsSequential()
   .Aggregate(new StringBuilder("<ROOT>"), (a, b) => a.Append(b))
   .Append("</ROOT>").ToString();

Measure the performance and see if it really does improve or not. if it doesn't don't do it in parallel.

mjwills
  • 23,389
  • 6
  • 40
  • 63
M.kazem Akhgary
  • 18,645
  • 8
  • 57
  • 118
  • `AsSequential` seems like an odd thing to use alongside `AsParallel`. You are sure it will run in parallel? – mjwills Jan 22 '19 at 12:02
  • It does run in parallel what ever comes before `AsSequential`. What ever comes after it is sequential. I haven't checked source code yet but i confirmed it by putting breakpoint inside and its hit by multiple threads. @mjwills – M.kazem Akhgary Jan 22 '19 at 13:13