I have a graph frame with vertices and edges as below. I am running this on pyspark in jupyter notebook.
vertices = sqlContext.createDataFrame([
("12345", "Alice", "Employee"),
("15789", "Bob", "Employee"),
("13467", "Charlie", "Manager"),
("14890", "David", "Director"),
("17737", "Fanny", "CEO")], ["id", "name", "title"])
edges = sqlContext.createDataFrame([
("12345", "13467", "works"),
("15789", "13467", "works"),
("13467", "14890", "works"),
("14890", "17737", "works"),
], ["src", "dst", "relationship"])
I need to find the hierarchical paths of each emp_id up to the highest level(which is the CEO in this case). I am trying the bfs approach and so far I am successful in getting the path for only one emp_id. Below is my code.
g = GraphFrame(vertices,edges)
result = g.bfs(fromExpr = "id == '12345'", toExpr = "title == 'CEO'", edgeFilter = "relationship == 'works'", maxPathLength = 5)
result.show(5,False)
Output:
+----------------------+-------------------+-----------------------+-------------------+----------------------+-------------------+-----------------+
|from |e0 |v1 |e1 |v2 |e2 |to |
+----------------------+-------------------+-----------------------+-------------------+----------------------+-------------------+-----------------+
|[12345,Alice,Employee]|[12345,13467,works]|[13467,Charlie,Manager]|[13467,14890,works]|[14890,David,Director]|[14890,17737,works]|[17737,Fanny,CEO]|
+----------------------+-------------------+-----------------------+-------------------+----------------------+-------------------+-----------------+
I can store this information in a variable and extract using the collect()
method.I want to loop through all the id's from the vertices which have a path to the CEO and write it to a dataframe. If anyone is familiar with graphframes can you please help me with this? I have tried looking into other solutions but none are working in my case.
Expected Output:
+-------+--------------------------+
|user_id|path |
+-------+--------------------------+
|12345 |12345->13467->14890->17737|
|15789 |15789->13467->14890->17737|
|13467 |13467->14890->17737 |
|14890 |14890->17737 |
|17737 |17737 |
+-------+--------------------------+