we have a system where the customer comes and interacts, triggers jobs and does many actions. We have 1000s of such users. Each job has a name and our backend database has all the data about the customer interactions.
These jobs fail often. We know why a particular job failed based on its inputs, but now we want to find what was the path taken by user (journey) before he reached the failure job. We want to see if we can improve the experience much before so that the failure is avoided.
Example (hypothetical), login->create file-> save file -> download file. Download file is failing with some error. Say this usually happens when a save has just completed. If you have done some operation between save file and download, then down load does not fail. That is a hidden bug possibly.
The question is - Given a history of 3000 users graph traversal (take paths of size 5 [as a moving window]) build a system that when asked **
"what are the most probable paths to reach node X"
gives the top 5 most probably paths to reach X.
I have created the nodes as [jobName][State], example, loginSuccess->createFileSuccess->SaveFileSuccess->DownloadFailed. X will be typically a [Job Name]Failed node that we will query. We have about 50 jobs and 3 states, success, failed cancelled.
Any idea how to build this model, which algorithm to use, and how to reverse generate the probabilities when a node is asked?
Adding some more clarity -
Given a target node, can I list what were the most probable paths to reach it with length 5. I dont know the starting points to start the dijkstra's. Also a direct path of low probability might exit from a given starting node, directly to the target node, but I need to find paths of length 5.