2

I need your help on an incomplete code which I wrote to handle the issue I have.

So I have this input file :-

-----INPUT : test2.csv-----
child, parent, relation
M3,Q,P
M54,M7,P
M54,M27,E
Q,M7,P
M7,Q,E
M7,M3,P
M27,Q,E

======OUTPUT REQUIRED====
M3->Q,P
M54->M7->Q,P
M7->Q,E
M27->Q,E

==================PROBLEM EXPLANATION===================

Q is the ultimate parent. I want to trace all childs back to Q (the shortest path to parent 'Q'). For example for the first row, the output should be =

M3->Q, P

But second row, M54 is child of M7 with a relation tag of 'P' (M54->M7, P), but we need to traverse M7 to the ultimate parent who is 'Q'. As we search through along the csv file for M7's parent, we see that from row 5 & 6, M7 can have either 'M3' as his parent or 'Q' as his parent. So we have two paths tracing to the ultimate parent Q :-

M54->M7->Q,PE & M54->M7->M3->Q,PPP

But we would like to have the shortest path only , i.e.

M54->M7->Q,PE 

Another issue is that we also have cyclic paths, eg consider row 4 & 5 :-

Q,M7,P
M7,Q,E

So in such a case we would like the output as M7->Q,E (rather than Q->M7->Q as one would expect).

This is the code Ive come up with so far :-

# Read the csv file
import csv
z=[]
with open('test2.csv', 'rb') as f:
    reader = csv.reader(f)
    for row in reader:
        z.append(row)        

# Now generate list-of-list to store relations. Eg [['M7', 'Q', ['P']],.....
for i in range(len(z)):
    t=z[i].pop()
    z[i].append([t])

# Now lets populate the list   
t=[]    
out=[]
for i in range(len(z)):
    child = z[i][0]
    if z[i][1]=='Q': #whos the parent ?
            out.append(z[i])
            continue
    t.append(z[i])
    index=t.index(z[i])
    print (index)
    parent=t[index][1]
    print (parent)
    length=len(t[index])
    for j in range(len(z)):
        if z[j][0] == child:
            if z[j][1]=='Q':
                t[index].insert(length,'Q')

#print (parent)
print ("temp=",t)
print ("out=",out)

1 Answers1

0

This is a solution heavily based on this essay on Python.org:

from pprint import pformat
import csv


def graph_from_csv(filename):
    """Turn a CSV of child, parent, relation entries into a graph represented
    by a dictionary, for example:

    {'A': [('B', 'ab'), ('C', 'ac')],
     'B': [('C','bc'), ('D','bd')],
     'C': [('D','cd')],
     'D': [('C','dc')]}
"""
    rows = []
    with open(filename, 'r') as f:
        reader = csv.reader(f)

        # Skip header row
        next(reader, None)

        for row in reader:
            rows.append(row)


    graph = dict()
    for row in rows:
        child, parent, relation = row
        if not parent in graph:
            graph[parent] = []
        graph[parent].append((child, relation))

    return graph


def find_shortest_path(graph, start, end, path=[], relations=[]):
    path = path + [start]
    if start == end:
        return path, relations
    if not start in graph:
        return (None, None)
    shortest = None
    shortest_rels = None
    for node, relation in graph[start]:
        if node not in path:
            newpath, newrelations = find_shortest_path(graph, node, end, path,
                                                       relations + [relation])
            if newpath:
                if not shortest or len(newpath) < len(shortest):
                    shortest_rels = newrelations
                    shortest = newpath
    return shortest, shortest_rels


def format_output(path, relations):
    p = '->'.join(path)
    r = ''.join(relations)
    return "%s,%s" % (p, r)


def main():
    graph = graph_from_csv('test2.csv')
    print("Graph:")
    print(pformat(graph))
    print()

    start, end = 'Q', 'M54'
    path, relations = find_shortest_path(graph, start, end)
    print("Shortest path from '%s' to '%s'" % (start, end))
    print(format_output(path, relations))
    print()

    start, end = 'Q', 'M7'
    path, relations = find_shortest_path(graph, start, end)
    print("Shortest path from '%s' to '%s'" % (start, end))
    print(format_output(path, relations))
    print()


if __name__ == '__main__':
    main()

Output:

Graph:
{'M27': [('M54', 'E')],
 'M3': [('M7', 'P')],
 'M7': [('M54', 'P'), ('Q', 'P')],
 'Q': [('M3', 'P'), ('M7', 'E'), ('M27', 'E')]}

Shortest path from 'Q' to 'M54'
Q->M7->M54,EP

Shortest path from 'Q' to 'M7'
Q->M7,E

You will notice that the direction for the path query and the resulting path is inverted compared to your example (from ancestor to child) - this is just how the original find_shortest_path function from the essay implemented the parent -> child relationship in a directed graph - and I stuck with it, because I feel that's the most natural way to represent it.

If you do want to invert it though, simply using reversed() on the respective lists (path and relations) and switching start and end will do the trick.


(1) This line warrants some explanation I think:

path = path + [start]

Why not just append to path? This is being done because path=[] is a mutable default argument, and if you would append to path, you would end up changing that default argument for all further calls to that function.

Community
  • 1
  • 1
Lukas Graf
  • 30,317
  • 8
  • 77
  • 92
  • Thanks a lot Graf. But I get this error while running it : - python3 ~/ml/matrix/csv_graph.py /tmp/test2.csv File "/home/vanilla/ml/matrix/csv_graph.py", line 62 print "Graph:" ^ SyntaxError: invalid syntax –  Dec 25 '13 at 12:55
  • Ah, I used Python 2.7, and you're using Python3, where `print` is a function, not a statement. Hang on, I'll update my code. – Lukas Graf Dec 25 '13 at 12:58
  • Sir Im trying it on various input files that I have. So that I could get back to yo in case of any issues. Thanks a lot for the code and the python essay link,that was really informative. Im trying to understand the essay too. –  Dec 27 '13 at 20:55
  • Please help. The program fails on this input file http://www.filedropper.com/sampleinput –  Jan 28 '14 at 06:35
  • @Sunita The solution I provided works for the example input in your question. So if it works for that, please accept the answer, and if it doesn't work for other input, make that a new question. In that question, include the input *inline*, not as a link, and details on how it "doesn't work" (error message, stack trace, ...). – Lukas Graf Jan 29 '14 at 14:02
  • Dear Lukas, I found the issue with my error. Actually I gave 1 100 line input file, but as your code prints hardcoded shortest paths for only two of them ('Q' to 'M54', & 'Q' to 'M7') therefore it was failing. So could you please tell me how to modify your code (in the main() call ) so that it automatically prints the shortest path for all the children (tracing them to their parent Q) . Because if I have hundred of children , them I cant call the all one by one so as to print the shortest for each. Thus it would be helpful if you could modify it to work like this. –  Feb 03 '14 at 11:11
  • Lukas Ive posted it as a different question http://stackoverflow.com/questions/21546291/shortest-path-and-relationship-from-a-csv-file-in-python –  Feb 04 '14 at 07:20