Most of the questions on SO are about how to do it? Like the question for which solution is sought or this one (How to get cpu-id in java?) but some of them actually ask to cause of the problem for example (Why does this fail without static_cast?) or (Why do I get a PHPDoc warning in PhpStorm over this code) that does not necessarily mean it has to be why question but the intent is asking for the reason behind the problem.
I want to get the root cause of the problem if it is expected (such as in why questions) and its solution, otherwise for how and what questions- just pick the accepted solution or most-voted ones.
How do you prepare a dataset for such task and how to solve it? I am looking for an idea. A preliminary approach will suffice to proceed. For my background, I have worked on sentiment analysis (using BOW method though), sequential network and have an idea about the latest NLP Transformers model.