MPI is a way of working with data (typically arrays of plain types like int or double). It is an API (as in interface, not library) which describes functions you can use to transmit and receive data in specific "patterns" amongst a set of compute nodes (potentially on separate machines).
It also describes a way to launch programs which are connected to each other in some way which supports the above operations, and a way for each launched process to know how many peers it has.
There are multiple competing implementations of MPI, such as OpenMPI and MPICH. If you write your program against the MPI specifications, you can use it with any implementation of MPI available on your compute platform. But all the processes in one job must use the same implementation of MPI, because it is an API only, and does not promise interoperability between implementations at runtime.
The reason why MPI might be called a paradigm is that it requires thinking about distributed computing in a specific way which is not familiar to most programmers. Once you have used it for even a single "real" program, you will see that it demands a way of thinking about data structures and algorithms which differs from programming with, say, sockets or message queues.