Your idea is great as Multiple-camera setups offer a solution to increase the coverage of the captured human body and to minimize the occlusions.
Please go through the document: Benefits of using multiple Azure Kinect DK devices to read more on Fill in occlusions. Although the Azure Kinect DK data transformations produce a single image, the two cameras (depth and RGB) are actually a small distance apart. The offset makes occlusions possible. Use the Kinect SDK to capture the depth data from both devices and store it in separate matrices. Align the two matrices using a 3D registration algorithm. This will help you to map the data from one device to the other, taking into account the relative position and orientation of each device.

Please refer to this article published by: Nadav Eichler
Spatio-Temporal Calibration of Multiple Kinect Cameras Using 3D Human Pose
Quoted:
When using multiple cameras, two main requirements must be fulfilled
in order to fuse the data across cameras:
- Camera Synchronization (alignment between the cameras’ clocks).
- Multi-Camera Calibration (calculating the mapping between cameras’
coordinate systems).
