Consider this an extended comment (and an extension) to the answer of Adriano.
OpenMP is really straightforward to master and use and it has the nice feature that both serial and parallel executables can be produced from one and the same source code. It also allows you to take the gradual paralellisation path if you need to convert an existing serial code to a parallel one. OpenMP has a set of drawbacks though. First, it targets only shared memory machines which severly limits its scalability, though large x86 SMP machines are now available (e.g. we have QPI coupled Xeon systems with 128 CPU cores sharing up to 2 TiB of shared RAM in our cluster installation, specifically targeted for large OpenMP jobs).
Second, its programming model is too simple to allow implementation of some advanced concepts. But I would say that this is a strength rather than a drawback of the model since it keeps OpenMP concise.
MPI is the de facto standard message passing API nowadays. It is widely supported and runs on vast variety of parallel architectures. Its distributed memory model imposes little to no restrictions on the underlying hardware (apart from having a low latency and high bandwidth network interconnect) and this allows it to scale to hundreds of thousands of CPU cores. MPI programs are also quite portable on the source level although the algorithms themselves might not posses a portable scalability (e.g. one MPI program might run quite efficiently on Blue Gene/P and horribly slow on an InfiniBand cluster). MPI has one severe drawback - its SPMD (Single Program Multiple Data) model requires a lot of schizophrenic thinking on behalf of the programmer and is much harder to master than OpenMP. Porting serial algorithms to MPI is never as easy as it is with OpenMP and sometimes a complete rewrite is necessary in order to achieve high parallel efficiency. It is also not possibe to take the gradual parallelisation approach and to maintain easily a codebase that can produce both serial and parallel executables. MPI has an interesting feature - since it separates completely the different parts of the program that run on separate nodes and provides an abstract interface to the network, it allows for heterogeneous computing. Several MPI implementations (e.g. Open MPI) provide heterogeneous support which allows one to mix not only nodes running under different OS but also CPUs with different "bitness" and endianness.
Intel TBB is like OpenMP on steroids. It provides much richer programming model based on kernels which puts it closer to other parallel programming paradigms like CUDA or OpenCL. It draws heavily from the C++ STL algorithms in terms of applicability and extensibility. It is also supposed to be compiler neutral and in principle should work with Intel C++ Compiler, GNU g++ and MSVC. ITBB also uses the task "stealing" ideology that can potentially even out the computational imballance that the previous paradigms are prone to if no precautions are taken.
Pthreads is the portable threading interface of the most modern Unix-alikes (e.g. FreeBSD, Mac OS X, Linux, etc.). It is just a threading library and is geared towards the most general usage cases that one can imagine. It provides little to no parallel constructs and one has to explicitly program them on top of it, e.g. even simple loop iterations distribution a la OpenMP has to be hand-coded. Pthreads is to Unix exactly what Win32 threads is to Windows.
(I would skip Microsoft TPP since I don't really know that library)
Mixing those concepts is clearly the way of the future as single nodes are progressively getting more and more cores. Multiple levels of parallelism are possible with most algorithms and one can use MPI to perform the coarse-grained parallelism (running on multiple cluster nodes) while OpenMP or ITBB can be used to perform the fine-grained divison of individual node computations. Shared memory programming can usually utilise memory resources better since they are all shared between threads and things like cache reusage can speed the calucations considerably. MPI also can be used to program a multicore SMP or NUMA machine but each MPI process is a separate OS process with its own virtual address space which means that lots of (configuration) data might need to get replicated. MPI people are working towards improvements to the standard to allow it to run MPI processes as threads and "MPI endpoints" might end up in the forthcoming MPI standard version 3.0.
I would suggest to pick the one that is closest to your programming background. If you are an avid C++ programmer and breathe abstractions then pick Intel TBB (or Microsoft PPL if you are into .Net). OpenMP is really easy to master and provides good performance but is somehow simplistic. It is still the only widely available and used mechanism for writing multithreaded code in Fortran. MPI has a steep learning curve but can always be bolted on later if your program outgrows what single compute node can provide.