By default, CUDA has a per-process default stream. There is a compiler flag --default-stream per-thread
which changes the behaviour to per-host-thread default stream, see the documentation.
Note that streams and host threads are programming-level abstractions for hardware details. Even with a single process, there is a limited number of streams you can use concurrently, depending on the hardware. For example, on the Fermi architecture, all streams were multiplexed into a single hardware queue, but since Kepler there are 32 separate hardware queues (see CUDA Streams: Best Practices and Common Pitfalls).
Since the programming guide does not talk about multiple processes in this part, I believe these abstractions do not define the behaviour of multi-process scenarios. As for multi-process, the right term is "CUDA context" which is created for each process and even each host thread (when using the runtime API). The question of how many contexts can be active on a device at the same time: the guide says in 3.4 Compute modes that in the default mode, "Multiple host threads can use the device". Since the following exclusive-process mode talks about CUDA contexts instead, I assume that this means that the description of the default mode covers even multiple host threads from multiple processes.
For more info about multi-process concurrency see e.g. How do I use Nvidia Multi-process Service (MPS) to run multiple non-MPI CUDA applications?, Unleash legacy MPI codes with Kepler's Hyper-Q and CUDA Streams: Best Practices and Common Pitfalls.
Finally, note that multi-process concurrency works this way since the Kepler architecture, which is the oldest supported architecture nowadays. Since the Pascal architecture there is support for compute preemption (see 3.4 Compute modes for details).