I have a large RDF file containing about 600,000 records of Wikidata taxonomy. From this file, I am only interested in subclassOf relations (predicate), thus, I ignore all the other statements keeping only "subclassOf" statements. The statement is like:
a
is a subclassOf
b
,
b
is a subclassOf
c
Like that c
is a parent of b
and b
is a parent of a
. And any parent can have many children. I want to build hierarchical tree using this taxonomy.
I have checked this thread and it almost solved my problem.
Recursively creating a tree hierarchy without using class/object
However, with this, I am getting tree in dictionary which I want to convert into tree data-structure. Following is what I have tried:
data = [['a','x'], ['b','x'], ['c','x'], ['x','y'], ['t','y'], ['y','p'], ['p','q']]
roots = set()
mapping = {}
for child,parent in data:
childitem = mapping.get(child,None)
if childitem is None:
childitem = {}
mapping[child] = childitem
else:
roots.discard(child)
parentitem = mapping.get(parent,None)
if parentitem is None:
mapping[parent] = {child:childitem}
roots.add(parent)
else:
parentitem[child] = childitem
for root in roots:
print(mapping[root])
tree = { id : mapping[id] for id in roots }
print(tree)
Output of tree looks as below:
{'q': {'p': {'y': {'t': {}, 'x': {'c': {}, 'b': {}, 'a': {}}}}}}
I want to convert this dictionary to tree. So e.g. when I say print(mapping['y']), it should give me Node y i.e.
q
├── p
└── y
currently, if I say mapping['y'], it gives me subtree rooted at y. I think there is some easy solution for this, but I am not able to understand it. I have found this link as well https://gist.github.com/hrldcpr/2012250 to convert dictionary into tree, but not sure how to use it in my case. Alernatively, if anyone knows to directly built a tree from the RDF data that I had given above, then it will be most welcome. Probably python's anytree API
will solve my issue.