I made a program that extracts the text from a HTML file. It recurses down the HTML document and returns the list of tags. For eg,
input < li >no way < b > you < /b > are doing this < /li >
output ['no','way','you','are'...].
Here is a highly simplified pseudocode for this:
def get_leaves(node):
kids=getchildren(node)
for i in kids:
if leafnode(i):
get_leaves(i)
else:
a=process_leaf(i)
list_of_leaves.append(a)
def calling_fn():
list_of_leaves=[] #which is now in global scope
get_leaves(rootnode)
print list_of_leaves
I am now using list_of_leaves in a global scope from the calling function. The calling_fn() declares this variable, get_leaves() appends to this.
My question is, how do I modify my function so that I am able to do something like list_of_leaves=get_leaves(rootnode), ie without using a global variable?
I dont want each instance of the function to duplicate the list, as the list can get quite big.
Please dont critisize the design of this particular pseudocode, as I simplified this. It is meant for another purpose: extracting tokens along with associated tags using BeautifulSoup