3

For a child-parent relationship table (csv), I am trying to gather possible parent to child relationship combination chains using all data in the table. I am trying against a problem where if multiple sub-parents exist (see rows 3 & 4), the second sub-parent combination (row 4) is not included in the iteration.

Data Example:

child,parent

A,B
A,C
B,D
B,C
C,D

Expected chain results:

D|B|A
D|C|B|A
D|C|A

Actual chain results:

D|B|A
D|C|A

Code

find= 'A' #The child for which the code should find all possible parent relationships
sequence = ''
with open('testing.csv','r') as f:     #testing.csv = child,parent table (above example)
    for row in f:
        if row.strip().startswith(find):
            parent = row.strip().split(',')[1]
            sequence = parent + '|' + find
            f1 = open('testing.csv','r')
            for row in f1:
                if row.strip().startswith(parent):
                    parent2 = row.strip().split(',')[1]
                    sequence = parent2 + '|' + sequence
                    parent = parent2
        else:
            continue
        print sequence
martineau
  • 119,623
  • 25
  • 170
  • 301
Sarah Bergquist
  • 375
  • 2
  • 6
  • 10
  • I don't understand this: `I am trying against a problem where if multiple sub-parents exist (see rows 3 & 4), the second sub-parent combination (row 4) is not included in the iteration` - yet you list `D|C|B|A` as expected. I don't think that result would be there if you exclude the 4th row: the B|C pair. – Gerrat Dec 02 '14 at 23:32
  • Current code does not consider the 4th row. That's exactly the issue. – Sarah Bergquist Dec 02 '14 at 23:37
  • 1
    As an aside Sarah, looking at your profile, you have asked a number of questions and people have answered. If someone provides an answer that you find acceptable, you should `accept` it by clicking the checkmark next to their answer. You haven't accepted any so far. – Gerrat Dec 03 '14 at 00:12

2 Answers2

7

Have you looked at this fantastic essay? It is essential reading to really understand patterns in python. Your problem can be thought of as a graph problem - finding the relationships is basically finding all paths from a child node to the parent node.

Since there could be an arbitrary amount of nesting (child->parent1->parent2...), you need a recursive solution to find all paths. In your code, you have 2 for loops - which will only result in 3level paths at most as you found out.

The code below was adapted from the link above to fix your issue. The function find_all_paths requires a graph as an input.

Let's create the graph from your file:

graph = {} # Graph is a dictionary to hold our child-parent relationships.
with open('testing.csv','r') as f:
    for row in f:
        child, parent = row.split(',')
        graph.setdefault(parent, []).append(child)

print graph

with your sample, this should print:

{'C': ['A', 'B'], 'B': ['A'], 'D': ['B', 'C']}

The following code is straight from the essay:

def find_all_paths(graph, start, end, path=[]):
    path = path + [start]
    if start == end:
        return [path]

    if not graph.has_key(start):
        return []

    paths = []

    for node in graph[start]:
        if node not in path:
            newpaths = find_all_paths(graph, node, end, path)
            for newpath in newpaths:
                paths.append(newpath)
    return paths

for path in find_all_paths(graph, 'D', 'A'):
    print '|'.join(path)

Output:

D|B|A
D|C|A
D|C|B|A
vikramls
  • 1,802
  • 1
  • 11
  • 15
0

I'm not sure if this is the most efficient way to do it (but reading the file in again on every row would be worse).

find= 'A' #The child for which the code should find all possible parent relationships
sequences = set(find)

# we'll build up a chain for every relationship, then strip out un-needed ones later
with open('testing.csv','r') as f:     #testing.csv = child,parent table (above example)
    for row in f:
        child, parent = row.strip().split(',')
        sequences.add(parent + '|' + child)
        for c in sequences.copy():  
            if c[0] == child:
                sequences.add(parent + '|' + c)


# remove any that don't end with our child:
sequences = set(s for s in sequences if s.endswith(find))

# get all shorter chains when we have a longer one
extra = set()
for g1 in sequences:
    for g2 in sequences:
        if g2[2:] == g1:
            extra.add(g1)

# remove the shorter chains
sequences.difference_update(extra)

for chain in sequences:
    print(chain)

Results:

D|C|A
D|C|B|A
D|B|A
Gerrat
  • 28,863
  • 9
  • 73
  • 101