First, you should not take whatever document that comes from as the only options for using Vulkan. There are many ways to use Vulkan, and those are just a few of them that NVIDIA was looking at on whatever presentation that comes from.
So I concluded that using only Primary is better than using Secondary buffer, but, all samples seem to be using secondary buffer in multi-threaded rendering.
Case in point: those samples aren't doing the same thing as what NVIDIA is doing.
The two NVIDIA methods are:
Generate a single completely static and unchanging command buffer. It presumably contains everything that can ever be rendered ever. Presumably, you use memory (indirect rendering, UBO data, etc) to control where they appear, how to draw more than one, or to prevent them from being drawn at all.
Generate a single completely static and unchanging command buffer for each object. Presumably, you use memory to control where they appear. You then render these objects into a primary command buffer.
Neither of these is threaded at all.
When people talk about threading rendering, what they're talking about is threading the creating of command buffers in the render loop. #2 doesn't create secondary command buffers during the render loop; they're static per object.
A typical threaded rendering system creates secondary command buffers that contain multiple objects, which will be submitted to the primary command buffer later in that frame. Each thread processes some count of objects to their command buffer. That's what those threading samples are doing.
So you're comparing apples to oranges here.
If I should support multiple queues, do I have to use all the multiple queues? (multi-thread -> generate command buffer -> submit multiple queues) or using single queue is fine?
Use whatever you feel you need. Parallel queue operations tend to be for things like memory transfers or complex compute operations that are going to do work for the next frame. Rendering tends to be something you need to happen in a specific order. And inter-queue dependencies for separate rendering operations can be very difficult to manage.
Also, remember that Vulkan runs on lots of hardware. In particular, it only requires that hardware provide a single queue. So even if you do some multi-queue programming, you still need a path to support single-queue systems.