0

we have a system where the customer comes and interacts, triggers jobs and does many actions. We have 1000s of such users. Each job has a name and our backend database has all the data about the customer interactions.

These jobs fail often. We know why a particular job failed based on its inputs, but now we want to find what was the path taken by user (journey) before he reached the failure job. We want to see if we can improve the experience much before so that the failure is avoided.

Example (hypothetical), login->create file-> save file -> download file. Download file is failing with some error. Say this usually happens when a save has just completed. If you have done some operation between save file and download, then down load does not fail. That is a hidden bug possibly.

The question is - Given a history of 3000 users graph traversal (take paths of size 5 [as a moving window]) build a system that when asked **

"what are the most probable paths to reach node X"

gives the top 5 most probably paths to reach X.

I have created the nodes as [jobName][State], example, loginSuccess->createFileSuccess->SaveFileSuccess->DownloadFailed. X will be typically a [Job Name]Failed node that we will query. We have about 50 jobs and 3 states, success, failed cancelled.

Any idea how to build this model, which algorithm to use, and how to reverse generate the probabilities when a node is asked?

Adding some more clarity -

Given a target node, can I list what were the most probable paths to reach it with length 5. I dont know the starting points to start the dijkstra's. Also a direct path of low probability might exit from a given starting node, directly to the target node, but I need to find paths of length 5.

  • You just need to find all path between the two Nodes (Login, X ).All the path you can use DFS with slight modification to just find all paths between the two nodes . http://www.geeksforgeeks.org/find-paths-given-source-destination/ – Hariom Singh Jul 28 '17 at 04:42
  • You can create each node with three state ..and traverse only from success node .treat each node as three node ..so start would be (Login_sucess_ node to x_success_Node) – Hariom Singh Jul 28 '17 at 04:46

2 Answers2

0

The first step I would take would be to construct a list of records of length 5, where each such record contains the 5 steps taken by a particular customer leading up to node X. Then you could simply sort this list and count the number of times each possible record occurs in it, to work out the most popular records.

Another approach would be to assign each edge exiting a node a score which was the fraction of paths that exited that node to exit it via that edge. Then compute the overall score for a path by multiplying together the scores for its edges, and again take the observed paths with highest scores.

mcdowella
  • 19,301
  • 2
  • 19
  • 25
0

From what I have understood you need to find path most likely followed by users and you can make nodes for each process and two process are connected to each other if a customer goes from one process to other.

STEP 1. Construct a graph for all 3000 users which will be a weighted graph 
        (as such weight of an edge will be number of times a user goes from 
        one process to another, so each time you find an already built edge 
        increment its weight by 1 or else make a new edge with weight =1)

Now, to find most probable path from source node to another

STEP 2. Apply Dijkstra's algorithm but with small change.
        Dijkstra's algo find smallest path from one node to every other 
        node,so you need to find maximum path from one node to another.

I think, it should work as all the edges have positive weight and it will give you the most probable path taken from one node to another by all users and you could easily get data of all nodes from source to destination node very easily.

But it will only give you the most probable path and not top 5 of them.

  • Thanks - however, my question is given a target node, can I list what were the most probable paths to reach it with length 5. I dont know the starting points to stat the dijkstra's. Also a direct path of low probability might exit from a given starting node, directly to the target node, but I need to find paths of length 5. – SpringCoder Jul 28 '17 at 09:26
  • Ok, if you need to find that then while making graph if a user goes from process1 to process2 just make directed edge from p2 to p1 and then apply Djkstra's on any node that you need to find the path from and while traversing keep the count for each node the distance from source node....If you find a node with distance 5 from source that will be the first node with most probability and similarly other nodes could be found. – Prateek Mirdha Jul 28 '17 at 12:46