So When I create user level threads, Kernels are not aware of them
This is not entirely true. At least not in the way you are thinking.
A user space thread library can choose to implement threads that do not map one to one to kernel threads - BUT -
each one of these user space threads runs on a kernel thread (when it does run). This just means that the library can do it's own scheduling in user space, and decide to map the user space threads (ones it deems ready to run) to kernel threads from a pool. In this sense, the kernel is not aware of the user space threads - BUT -
it is very much aware of the kernel threads that are used to run these user space threads.
The kernel manages and schedules the kernel threads. It can run them on multiple CPUs if it so decides. In doing so, it causes the user space threads mapped to such kernel threads to run on these CPUs as well.
This can be seen in many systems in fact. Threads
in Java or Python, greenlets
in Python, goroutines
in golang - all use this mechanism.
Pthreads used to be like this too, but their implementation was changed to map each pthread to a dedicated kernel thread. But it is quite possible for a pthreads implementation to still use the user space thread model.
There is another model where user space threads can be completely a user space abstraction, which the kernel is entirely unaware of. For example, it is possible to use setcontext() and getcontext() to implement user space threads that live inside a single process.