A .clu file follows this format:
- First line: *Vertices NUMBER_OF_VERTICES
- Second line: Partition of vertex 0
- Third line: Partition of vertex 1
and so on until all NUMBER_OF_VERTICES are defined into a partition
Reading the community detection algorithms from networkx (https://networkx.github.io/documentation/stable/reference/algorithms/community.html) the preferred format in networkx is a iterable (i.e. a list or tuple) grouping the vertices number in each partition, for example:
- [[0, 1, 2, 3, 4], [5], [6, 7, 8, 9, 10]]
That would mean that the first partition is composed of vertices 0,1,2,3 and 4.
So, reading a .clu file is the task of converting the file into that structure.
I picked up the read_pajek function at https://networkx.github.io/documentation/networkx-1.10/_modules/networkx/readwrite/pajek.html#read_pajek and transformed it into a working read_pajek_clu function (you need to import defaultdict from collections).
def parse_pajek_clu(lines):
"""Parse Pajek format partition from string or iterable.
Parameters
----------
lines : string or iterable
Data in Pajek partition format.
Returns
-------
communities (generator) – Yields sets of the nodes in each community.
See Also
--------
read_pajek_clu()
"""
if isinstance(lines, str):
lines = iter(lines.split('\n'))
lines = iter([line.rstrip('\n') for line in lines])
labels = [] # in the order of the file, needed for matrix
while lines:
try:
l = next(lines)
except: # EOF
break
if l.lower().startswith("*vertices"):
l, nnodes = l.split()
communities = defaultdict(list)
for vertice in range(int(nnodes)):
l = next(lines)
community = int(l)
communities.setdefault(community, []).append(vertice)
else:
break
return [ v for k,v in dict(communities).items() ]
You can check a working example at the repository:
https://github.com/joaquincabezas/networkx_pajek_util
Also, once you have the partition, it's a good start to use something like this idea from Paul Broderson to draw it:
how to draw communities with networkx
I hope this helps!