-2

I have a multithread application, compare CompareRow in the two List o1 and o2, and get the similarity, then store o1.CompareRow,o2.CompareRow, and Similarity in List, because the getSimilarity Process is very time consuming, and the data usually more than 1000, so using multithread should be helping, but in fact, it is not, pls help point out, several things i already consider are

1. Database shouldnot be a problem, cause i already load the data into
two List<>
2. There is no shared writable data to complete
3. the order of the records is not a problem

pls help and it is urgent, the deadline is close....

public class OriginalRecord
{
    public int PrimaryKey;
    public string CompareRow;
}
 ===============================================
public class Record
{
   // public ManualResetEvent _doneEvent;
    public string r1 { get; set; }
    public string r2 { get; set; }
    public float similarity { get; set; }
    public  CountdownEvent CountDown;
    public Record(string r1, string r2, ref CountdownEvent _countdown)
    {
        this.r1 = r1;
        this.r2 = r2;
        //similarity = GetSimilarity(r1, r2);
       CountDown = _countdown;
    }

   public void ThreadPoolCallback(Object threadContext)
     {
         int threadIndex = (int)threadContext;
         similarity = GetSimilarity(r1, r2);
         CountDown.Signal();
     }


     private float GetSimilarity(object obj1, object obj2)
     {
      //Very time-consuming
      ComparisionLibrary.MatchsMaker match 
     = new ComparisionLibrary.MatchsMaker   (obj1.ToString(), obj2.ToString());
         return match.Score;
     }
}



    ================================================================
public partial class FormMain : Form
{
public FormMain()
    {
        InitializeComponent();

        List<OriginalRecord> oList1=... //get loaded from database
        List<OriginalRecord> oList2 =... //get loaded from database


        int recordNum = oList1.Count * oList2.Count;
        CountdownEvent _countdown = new CountdownEvent(recordNum);

        Record[] rArray = new Record[recordNum];
        int num = 0;
        for (int i = 0; i <oList1.Count; i++)
        {
            for (int j = 0; j < oList2.Count; j++)
            {
            //Create a record instance 
            Record r
            =new Record(oList1[i].CompareRow,oList2[j].CompareRow,ref _countdown);
                rArray[num]=r;
            //here use threadpool
            ThreadPool.QueueUserWorkItem(r.ThreadPoolCallback, num);

            num++;
            }
        }
        _countdown.Wait();

       List<Record> rList = rArray.ToList();
       PopulateGridView(rList);
    }

Here are the photos i capture in the debug mode two things bring to my attention are 1. there are only 4 threads created to work but i set the minthreads in the threadpool is 10 2. as you can see, even 4 threads are created , but only one thread working at any time, what is worse is sometimes none of the thread is working by the way, the ComparisionLibrary is the library i download to do the heavy work

i cant post photo, would you pls leave me an email or sth that i can send the photos to you,thanks.

Vicky Liao
  • 17
  • 5
  • I think it is recomended to use PLINQ http://msdn.microsoft.com/en-us/library/dd997425.aspx instead of thread pool. You can do it easily with it. – Jayantha Lal Sirisena Mar 21 '12 at 07:43
  • I tried,but not really Parallel.ForEach(oList1, (o1, state, i) => { Parallel.ForEach(oList2, (o2, state1, i1) => { Record r = new Record(o1.CompareRow, o2.CompareRow); rList.Add(r); }); });rList is concurrentBag – Vicky Liao Mar 21 '12 at 07:48
  • I also recommend using plinq (parallel for). Also you should note that if your processor doesn't have enough physical cores (hyper-threading doesn't always work as advertised) running a program in multiple threads will actually slow everything down. – linkerro Mar 21 '12 at 07:56
  • i checked, my cpu is quad core, and before, when i was runing single thread, the cpu usage is always 10% or even lower, that is why i think multithread can make better use of the idle resource at the first place – Vicky Liao Mar 21 '12 at 08:05
  • one thing i observed is when the program is running, the UI is always frozen even the CPU is not busy, is it normal? – Vicky Liao Mar 21 '12 at 08:11
  • Strange, I've grabbed your code(and added `ComparisionLibrary.MatchsMaker` with `Thread.Sleep`), and multi-threaded solution works much faster for me than single-threaded one... How is `ComparisionLibrary.MatchsMaker` implemented? Are there any locks or similar bottlenecks? – alex.b Mar 21 '12 at 10:12

3 Answers3

1

If you want to split your large task into several small tasks, do mind that parallelization only works if the small tasks are not too short in terms of run time. If you cut your 2 sec runtime job into 100.000 0.02 ms jobs, the overhead of distributing the jobs over the workers can be so great that the process runs much slower in parallel then it would normally do. Only if the overhead of parallelization is much smaller than the average runtime of one of the small tasks, you will see a performance gain. Try to cut up your problem in larger chunks.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • See also http://stackoverflow.com/questions/9808495/multicore-with-plyr-mc for some information, although the example in not using C# – Paul Hiemstra Mar 21 '12 at 16:49
0

Hi, my assumptions:

1) If your computer has 1 CPU than you won't gain any performance improvement. Indeed, you will lose in the performance, because of Context Switch.

2) Try to set a max threads value of the ThreadPool class.

ThreadPool.SetMaxThreads(x);

3) Try to use PLINQ.

Taras Feschuk
  • 689
  • 1
  • 6
  • 8
  • my cpu is quad-core, and i set the threads to 100, and i can see more than 100 threads running, and the cpu usage is nearly 80% to 90%, but still, using more time than single thread, so sad~~~ – Vicky Liao Mar 21 '12 at 07:59
  • Actually, i thought of context switch before, but nothing prove that is the issue, besides, context switch seems avoidable, nothing i can do... – Vicky Liao Mar 21 '12 at 08:02
0

As a guess, try to use TPL instead of plain ThreadPool, e.g.:
Replace

ThreadPool.QueueUserWorkItem(r.ThreadPoolCallback, num);

by such thing:

int nums = new int[1];
nums[0] = num;
    Task t = null;
    t = new Task(() =>
    {
        r.ThreadPoolCallback(nums[0]);
    });
    t.Start();

EDIT You're trying to compare two list1, list2, each item in separate thread from pool, which is list1.Count*list2.Count thread pool calls, which looks for me like to much paralelization(this could cause many context switches) .

Try to split all comparison tasks onto several groups(e.g. 4-8) and start each one in separate thread. Maybe this could help. But again, this is only an assumption.

alex.b
  • 4,547
  • 1
  • 31
  • 52