Try this :
def toTree(expression):
tree = dict()
msg =""
stack = list()
for char in expression:
if(char == '('):
stack.append(msg)
msg = ""
elif char == ')':
parent = stack.pop()
if parent not in tree:
tree[parent] = list()
tree[parent].append(msg)
msg = parent
else:
msg += char
return tree
expression = "(Root(AB(ABC)(CBA))(CD(CDE)(FGH)))"
print toTree(expression)
It returns a dictionary, where the root can be accessed with the key ''. You can then do a simple BFS to print the output.
OUTPUT :
{
'' : ['Root'],
'AB' : ['ABC', 'CBA'],
'Root': ['AB', 'CD'],
'CD' : ['CDE', 'FGH']
}
You will have to eliminate all the whitespaces in the Expression before you start, or ignore the inrrelevant charaters in the expression by adding the following as the very first line in the for-loop :
if char == <IRRELEVANT CHARACTER>:
continue
The above code will run in O(n) time, where n is the length of the expression.
EDIT
Here is the printing function :
def printTree(tree, node):
if node not in tree:
return
print '%s -> %s' % (node, ' '.join(child for child in tree[node]))
for child in tree[node]:
printTree(tree, child)
The desired Output can be achieved by the following :
expression = "(Root(AB(ABC)(CBA))(CD(CDE)(FGH)))"
tree = toTree(expression)
printTree(tree, tree[''][0])
Output
Root -> AB CD
AB -> ABC CBA
CD -> CDE FGH
EDIT
Assuming the node names are not unique, we just have to give new names to the nodes. This can be done using :
def parseExpression(expression):
nodeMap = dict()
counter = 1
node = ""
retExp =""
for char in expression:
if char == '(' or char == ')' :
if (len(node) > 0):
nodeMap[str(counter)] = node;
retExp += str(counter)
counter +=1
retExp += char
node =""
elif char == ' ': continue
else :
node += char
return retExp,nodeMap
The print Function will now change to :
def printTree(tree, node, nodeMap):
if node not in tree:
return
print '%s -> %s' % (nodeMap[node], ' '.join(nodeMap[child] for child in tree[node]))
for child in tree[node]:
printTree(tree, child, nodeMap)
The output can be obtained by using :
expression = " ( Root( SQ ( VBZ ) ( NP ( DT ) ( NN ) ) ( VP ( VB ) ( NP ( NN ) ) ) ))"
expression, nodeMap = parseExpression(expression)
tree = toTree(expression)
printTree(tree, tree[''][0], nodeMap)
Output :
Root -> SQ
SQ -> VBZ NP VP
NP -> DT NN
VP -> VB NP
NP -> NN