I have a code that creates adjacencies between data from my text file. The file structure is pretty simple. It has only 2 columns which describe the connections between 2 nodes. For example:
ANALYTICAL_BALANCE BFG_DEPOSIT
CUSTOMER_DETAIL BALANCE
BFG_2056 FFD_15
BALANCE BFG_16
BFG_16 STAT_HIST
ANALYTICAL_BALANCE BFG_2056
CUSTOM_DATA AND_11
AND_11 DICT_DEAL
DICT_DEAL BFG_2056
I load the data right now into list.
data = [line.split() for line in open('data.txt', sep=' ')
I get list like this
data = [
["ANALYTICAL_BALANCE","BFG_DEPOSIT"],
["CUSTOMER_DETAIL","BALANCE"],
["BFG_2056", "FFD_15"],
["BALANCE","BFG_16"],
["BFG_16","STAT_HIST"],
["ANALYTICAL_BALANCE","BFG_2056"],
["CUSTOM_DATA","AND_11"],
["AND_11","DICT_DEAL"],
["DICT_DEAL","BFG_2056"]
]
then I create the adjency list
def create_adj(edges):
adj = {} # or use defaultdict(list) to avoid `if` in the loop below
for a, b in edges:
if not a in adj:
adj[a] = []
if not b in adj:
adj[b] = []
adj[a].append(b)
return adj
and return all paths
def all_paths(adj):
def recur(path):
node = path[-1]
neighbors = [neighbor for neighbor in adj[node] if not neighbor in path]
if not neighbors:
yield path
for neighbor in neighbors:
yield from recur(path + [neighbor])
for node in adj:
yield from recur([node])
So for example that I gave earlier, the output data will be like this. I don't print the lists with length equal to 1.
adj = create_adj(data)
paths = all_paths(adj)
for i in paths:
if len(i) > 1:
print(i)
output:
['ANALYTICAL_BALANCE', 'BFG_DEPOSIT']
['ANALYTICAL_BALANCE', 'BFG_2056', 'FFD_15']
['CUSTOMER_DETAIL', 'BALANCE', 'BFG_16', 'STAT_HIST']
['BALANCE', 'BFG_16', 'STAT_HIST']
['BFG_2056', 'FFD_15']
['BFG_16', 'STAT_HIST']
['CUSTOM_DATA', 'AND_11', 'DICT_DEAL', 'BFG_2056', 'FFD_15']
['AND_11', 'DICT_DEAL', 'BFG_2056', 'FFD_15']
['DICT_DEAL', 'BFG_2056', 'FFD_15']
Everything is fine while the data set is small, but I have almost 13k rows of this connections in txt file. The compilation just takes too long. That's why I want to change all operations on lists to pandas dataframes. I don't know how because I don't have the experience with it. How would you do it ? Maybe you have better idea how I could implement my idea. I was thinking also about using Networkx, but I don't know how I could implement my code using it. Any help would be greatly appreciated!