What makes you think you can't mix and match things? Also you suffer from misconceptions about what OpenGL, Qt and Wayland are. In general the question is far too broad to answer satisfyingly. I already gave a lengthy explanation here: https://stackoverflow.com/a/18006486/524368
In short terms:
- Wayland: A protocol to send framebuffer handles and user input and interaction events between procsses (nothing more)
- Wayland compositor: creates framebuffers and gives them to clients using the Wayland protocol
- Wayland client: draws to the framebuffers by whatever method they like
Westion: A Wayland compositor that creates framebuffers as on-screen windows for clients to draw to
X11: A graphics and user input server protocol; clients can connect to a X11 server and send it X11 drawing commands that the server executes toward a framebuffer (drawable in X11 terms). Traditionally X11 draws directly to the screen, but given the right X11 server (Xwayland) can draw to a Wayland framebuffer as well. Also X11 multiplexes user input (mouse, keyboard, joystick, and so on) to the connected clients.
- X11 server: Implements the display side backend of a X11 system.
X11 client: Connects to a X11 server and sends drawing commands.
OpenGL: A framebuffer oriented drawing API. Essentially OpenGL provides methods to efficiently draw points, lines and triangles to a framebuffer. The process of drawing can be either fixed function in a hardwired set of operations (old style OpenGL) or freely programmed (modern, shader based OpenGL). OpenGL drawing commands can be sent to a X11 server that implements GLX; in that case the OpenGL commands are executed by the X11 server and the clients just sends the server a stream of commands. However often OpenGL commands are executed directly by the process using the OpenGL API, going directly to the GPU which processes them into pixels in a framebuffer.
Toolkits like Qt or GTK: Provide abstractions for the creation of user interfaces. Windows and widgets are logical structures that organize the contents of a framebuffer. User input is processed in a event system which the programmer can use to control the facilities provided by the toolkit. User interface elements are drawn by whatever means are available to the toolkit. This can be either drawing methods native to the graphics system being used (X11 drawing commands) or if there is no native set drawing commands, as in Wayland, with whatever fits the bill. So OpenGL is a viable option (and is used as such) by Qt do draw its userinterface elements if the runtime environment of the process is well suited for this.
How do normally people make GUI programs with a mix of windowed GUI and 3D rendering? i.e. SolidWorks
They use a toolkit like Qt and where 3D stuff must be drawn OpenGL (or a similar API) is being used.
It seems that OpenGL is tailored more towards rendering and creating 3D Objects/Worlds which would be more like a game, for lack of a better description.
OpenGL just draws points, lines and triangles.
There are no "scenes" or "worlds" in OpenGL. Everything that looks like a scene or a game environment happens due to the program logic using OpenGL in a way, that the outcome looks like it.
Also OpenGL doesn't do windows, it doesn't to user input. Use a GUI toolkit for the whole widget business and use OpenGL to draw your scene.