3

It came out as a surprise when my parallelized version of fibonacci implementation(inefficient, just to compare the performance of the library) was much much slower than normal inefficient implementation, even after using all the 8 logical cores of my i7-6700HQ processor. Processor fans started going haywire the processing time was very slow compared to non-parallel implementation.

The example is directly from the TBB tutorial on intel - https://www.threadingbuildingblocks.org/tutorial-intel-tbb-task-based-programming

Here is my code

#include <tbb/task_group.h>
#include <chrono>
#include <iostream>

#define FIB_NUM 40

long fib1(int n)
{
    if(n < 2) return n;

    else
    {
        int x, y;
        tbb::task_group g;
        g.run([&]{x=fib1(n - 1);});
        g.run([&]{y=fib1(n - 2);});
        g.wait();
        return x + y;
    }
}

long fib2(int n)
{
    return n < 2? n : fib2(n - 1) + fib2(n - 2);
}

int main()
{
    auto t1 = std::chrono::high_resolution_clock::now();
    std::cout << fib2(FIB_NUM) << std::endl;
    auto t2 = std::chrono::high_resolution_clock::now();
    std::cout << (t2 - t1).count() << std::endl;
    t1 = std::chrono::high_resolution_clock::now();
    std::cout << fib1(FIB_NUM) << std::endl;
    t2 = std::chrono::high_resolution_clock::now();
    std::cout << (t2 - t1).count() << std::endl;
}

I dont know what I am doing wrong. It would be helpful if someone could point it out.

Thanks

Sagar B Hathwar
  • 536
  • 6
  • 18
  • 1
    from your link: "is not a scalable solution" => it's not surprising that implementation is slower than trivial one. You create a lot of thread, and synchronize them, just to do an addition... overhead of creating thread is bigger than operation you realize. – Garf365 Sep 05 '16 at 09:20
  • Oh, I see that now! I actually wanted to parallelize recursive matrix multiplication as an assignment. I thought of using tbb for that purpose. If this is the case, then is TBB worth using for that application? Infact, what are the application where tbb is suitable? Would like some resources to guide me. Thank you – Sagar B Hathwar Sep 05 '16 at 09:23
  • Asking for external resources is Off Topic on SO. But, maybe, a general question about opportunities to use thread to increase speed of computation can be accepted (not sure). ou can use thread (with tbb or other lib) when you compute a really big amount of independent data (like apply filter on image, or compute average – Garf365 Sep 05 '16 at 10:33
  • 1
    Fundamentally your problem is that the work accomplished by a forked thread, is very small compared to the overhead of creating the thread and tearing it down when it completes. In this case, you lose by going parallel. There is a nastier problem hiding in the "big stack" model used by most compilers: lots of threads each with their own big stack burn up all the memory so you don't get cache wins and you might page-fault thrash. You might find this SO answer about another parallel language implementation of Fibonacci that actually works interesting: http://stackoverflow.com/a/5086087/120163 – Ira Baxter Sep 05 '16 at 11:41

1 Answers1

7

The main problem of the example is small tasks. The leaf tasks (n<2) just calculate return n. Undoubtedly, it is inefficient for parallelism. The example can be improved with "cutoff" condition when the sub-problem is considered to be too small for parallelization. Let's suppose that there is no sense to calculate the first 12 fibonacci numbers in parallel and we will use a serial implementation instead:

long fib1(int n)
{
    // Use a serial implementation for "small" numbers.
    if(n < 12) return fib2(n);
    else
    {
        int x, y;
        tbb::task_group g;
        g.run([&]{x=fib1(n - 1);});
        g.run([&]{y=fib1(n - 2);});
        g.wait();
        return x + y;
    }
}

Perhaps, you want to read articles about Divide and Conquer and The Task Scheduler.

P.S. Intel TBB uses task based approach for parallelism. The method tbb::task_group::run creates a task (not a thread) that is executed when one of threads from the thread pool is available. So it does not matter how many tasks in the system - the number of threads is always limited.

Alex
  • 612
  • 3
  • 9
  • I CANNOT open the link in your answer, and get '403 Forbidden'. B.T.W it seems I CANNOT open any pages on [https://software.intel.com](https://software.intel.com), and always get '403 Forbidden'. When I used to google something on TBB, the first answer was always from this site, and I couldn't open it... Do you have any idea on that? – for_stack Sep 05 '16 at 15:58
  • the same links on open source site https://www.threadingbuildingblocks.org/docs/help/index.htm#tbb_userguide/Design_Patterns/Divide_and_Conquer.html and https://www.threadingbuildingblocks.org/docs/help/hh_goto.htm?index.htm#tbb_userguide/The_Task_Scheduler.html – Vladimir Polin Sep 05 '16 at 18:36
  • our TBB support team asked to publish this response: Can you try a different browser accessing software.intel.com? If it does not help I suggest to install an HTTP analyzer http://www.ieinspector.com/httpanalyzer/index.html to inspect what is the real HTTP request is sent by your browser and what is the server answer. Some plug-ins installed in web-browsers can affect the HTTP request generation. – Vladimir Polin Sep 06 '16 at 10:29