I am working on software that deals with sparse matrices. They're not huge (anywhere from 15x15 to ~300x300). I'd like to be able to store a representation of the matrix in a short string so that I can store it as a value in a CSV file (along with many other things).
What I've tried so far is to treat the matrix as a binary string, and convert to base62:
import numpy as np
import networkx as nx
def graphToHash(a,numnodes):
def baseN(num,b,numerals="0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"):
return ((num == 0) and numerals[0]) or (baseN(num // b, b, numerals).lstrip(numerals[0]) + numerals[num % b])
return str(numnodes) + '!' + baseN(int(''.join([str(i) for i in flatten_list(a)]),2), 62)
def flatten_list(l):
l1=[item for sublist in l if isinstance(sublist,list) or isinstance(sublist,np.ndarray) for item in sublist]
l=l1+[item for item in l if not isinstance(item,list) and not isinstance(item,np.ndarray)]
return l
# example
import sys
sys.setrecursionlimit(10000)
a=np.array(nx.to_numpy_matrix(nx.connected_watts_strogatz_graph(160,8,.3,1000))).astype(int)
hash=graphToHash(a,160)
len(hash) # ~4300 characters
This works fine for small graphs (30 nodes is ~150 characters). However larger graphs are bit unwieldy (160 nodes is ~4300 and requires me to increase the recursion limit).
Because the graph is binary and sparse, I know I can do better. Ideally I'd like to continue using strings that are {0-9,a-z,A-Z} because I know these will not pose any issues in my CSV file.
What is the most efficient way to compress a binary sparse matrix?