4

I'm trying to generate all unique digraphs that fit a spec:

  • each node must have exactly 2 inputs
  • and are allowed arbitrarily many outputs to other nodes in the graph

My current solution is slow. Eg for 6 nodes, the algo has taken 1.5 days to get where I think it's complete, but it'll probably be checking for a few more days still.

My algorithm for a graph with n nodes:

  1. generate all n-length strings of 0, where one symbol is a 1, eg, for n=3, [[0,0,1], [0,1,0], [1,0,0]]. These can be thought of as rows from an identity matrix.

  2. generate all possible n * n matrixes where each row is all possible combinations of step 1. + step 1.

This is the connectivity matrix where each cell represents a connection from column-index to row-index

So, for n=3, these are possible:

  • [0,1,0] + [1,0,0] = [1,1,0]
  • [1,0,0] + [1,0,0] = [2,0,0]

These represent the inputs to a node, and by adding step 1 to itself, the result will always represent 2 inputs.

For ex:

     A B C     
A' [[0,1,1],
B'  [0,2,0],
C'  [1,1,0]]

So B and C connect to A once each: B -> A', C -> A',

And B connects to itself twice: B => B'

  1. I only want unique ones, so for each connectivity matrix generated, I can only keep it if it is not isomorphic to an already-seen graph.

This step is expensive. I need to convert the graph to a "canonical form" by running through each permutation of isomorphic graphs, sorting them, and considering the first one as the "canonical form".

If anyone dives into testing any of this out, here are the count of unique graphs for n nodes:

2 - 6
3 - 44
4 - 475
5 - 6874
6 - 109,934 (I think, it's not done running yet but I haven't found a new graph in >24 hrs.)
7 - I really wanna know! 

Possible optimizations:

  1. since I get to generate the graphs to test, is there a way of ruling them out, without testing, as being isomorphic to already-seen ones?

  2. is there a faster graph-isomorphism algorithm? I think this one is related to "Nauty", and there are others I've read of in papers, but I haven't had the expertise (or bandwidth) to implement them yet.

Here's a demonstrable connectivity matrix that can be plotted at graphonline.ru for fun, showing self connections, and 2 connections to t he same node:

1, 0, 0, 0, 0, 1, 
1, 0, 0, 0, 1, 0, 
0, 1, 0, 1, 0, 0, 
0, 1, 2, 0, 0, 0, 
0, 0, 0, 1, 0, 1, 
0, 0, 0, 0, 1, 0,

here's the code in haskell if you want to play with it, but I'm more concerned about getting the algorithm right (eg pruning down the search space), than the implementation:

-- | generate all permutations of length n given symbols from xs
npermutations :: [a] -> Int -> [[a]]
npermutations xs size = mapM (const xs) [1..size]

identity :: Int -> [[Int]]
identity size = scanl
                (\xs _ -> take size $ 0 : xs)      -- keep shifting right
                (1 : (take (size - 1) (repeat 0))) -- initial, [1,0,0,...]
                [1 .. size-1]                      -- correct size

-- | return all possible pairings of [Column]
columnPairs :: [[a]] -> [([a], [a])]
columnPairs xs = (map (\x y -> (x,y)) xs)
                 <*> xs

-- | remove duplicates
rmdups :: Ord a => [a] -> [a]
rmdups = rmdups' Set.empty where
  rmdups' _ [] = []
  rmdups' a (b : c) = if Set.member b a
    then rmdups' a c
    else b : rmdups' (Set.insert b a) c


-- | all possible patterns for inputting 2 things into one node.
-- eg [0,1,1] means cells B, and C project into some node
--    [0,2,0] means cell B projects twice into one node
binaryInputs :: Int -> [[Int]]
binaryInputs size = rmdups $ map -- rmdups because [1,0]+[0,1] is same as flipped
         (\(x,y) -> zipWith (+) x y)
         (columnPairs $ identity size)

transposeAdjMat :: [[Int]] -> [[Int]]
transposeAdjMat ([]:_) = []
transposeAdjMat m = (map head m) : transposeAdjMat (map tail m)

-- | AdjMap [(name, inbounds)]
data AdjMap a = AdjMap [(a, [a])] deriving (Show, Eq)

addAdjColToMap :: Int -- index
               -> [Int] -- inbound
               -> AdjMap Int
               -> AdjMap Int
addAdjColToMap ix col (AdjMap xs) = 
  let conns = foldl (\c (cnt, i) -> case cnt of
                        1 -> i:c
                        2 -> i:i:c
                        _ -> c
                    ) 
              [] 
              (zip col [0..])  in
    AdjMap ((ix, conns) : xs)

adjMatToMap :: [[Int]] -> AdjMap Int
adjMatToMap cols = foldl 
  (\adjMap@(AdjMap nodes) col -> addAdjColToMap (length nodes) col adjMap)
  (AdjMap [])
  cols

-- | a graph's canonical form : http://mfukar.github.io/2015/09/30/haskellxiii.html
-- very expensive algo, of course
canon :: (Ord a, Enum a, Show a) => AdjMap a -> String
canon (AdjMap g) = minimum $ map f $ Data.List.permutations [1..(length g)]
   where
      -- Graph vertices:
      vs = map fst g
      -- Find, via brute force on all possible orderings (permutations) of vs,
      -- a mapping of vs to [1..(length g)] which is minimal.
      -- For example, map [1, 5, 6, 7] to [1, 2, 3, 4].
      -- Minimal is defined lexicographically, since `f` returns strings:
      f p = let n = zip vs p
            in (show [(snd x, sort id $ map (\x -> snd $ head $ snd $ break ((==) x . fst) n)
                                      $ snd $ take_edge g x)
                     | x <- sort snd n])
      -- Sort elements of N in ascending order of (map f N):
      sort f n = foldr (\x xs -> let (lt, gt) = break ((<) (f x) . f) xs
                                  in lt ++ [x] ++ gt) [] n
      -- Get the first entry from the adjacency list G that starts from the given node X
      -- (actually, the vertex is the first entry of the pair, hence `(fst x)`):
      take_edge g x = head $ dropWhile ((/=) (fst x) . fst) g

-- | all possible matrixes where each node has 2 inputs and arbitrary outs
binaryMatrixes :: Int -> [[[Int]]]
binaryMatrixes size = let columns = binaryInputs size 
                          unfiltered = mapM (const columns) [1..size] in
  fst $ foldl'
  (\(keep, seen) x -> let can = canon . adjMatToMap $ x in
                        (if Set.member can seen 
                         then keep
                         else id $! x : keep
                        , Set.insert can seen))
  ([], Set.fromList [])
  unfiltered
Josh.F
  • 3,666
  • 2
  • 27
  • 37
  • Can nodes have edges to themselves? Can a node have 2 edges to the same other node? – Cirdec Nov 01 '17 at 16:43
  • @Cirdec yes, perhaps my explanation was lacking, but a node can connect to itself or any other node twice. – Josh.F Nov 01 '17 at 16:45
  • Generating them by their out-degree sequence would remove a bunch of isomorhisms. – Cirdec Nov 01 '17 at 16:57
  • @Cirdec Could you clarify? If I understand, I am generating by in-degree, but there can be vastly more variability in the out-degrees (since in=2, and out=arbitrary), so I would guess out-degree would be more expensive. – Josh.F Nov 01 '17 at 17:00
  • I can't read Haskell well enough to tell for sure, but your description of your canonical check suggests that it is only loosely related to nauty's algorithm. Essentially you are saying that https://en.wikipedia.org/wiki/Bogosort is related to quicksort :) – gilleain Nov 01 '17 at 17:00
  • @gilleain you're probably right, I'm no mathematician (yet?). Could there be another algo you'd recommend? Or would implementing Nauty properly be a good solution? – Josh.F Nov 01 '17 at 17:02
  • Implementing a more nauty/bliss -like algorithm might be good, but there are other possibilities. This SO question is a quick introduction to partition refinement : https://stackoverflow.com/questions/26893304/rejecting-isomorphisms-from-collection-of-graphs – gilleain Nov 01 '17 at 17:06
  • Also see : http://pallini.di.uniroma1.it/Introduction.html – gilleain Nov 01 '17 at 17:06
  • What are the 6 graphs for 2 nodes? Perhaps I'm misunderstanding or doing something stupid, but I can only come up with 5. – ryachza Nov 01 '17 at 17:18
  • @ryachza maybe you can plug this into a text editor and add spaces to clean it up? `[[2,0],[2,0]] [[2,0],[1,1]] [[2,0],[0,2]] [[1,1],[2,0]] [[1,1],[1,1]] [[0,2],[2,0]]` – Josh.F Nov 01 '17 at 17:20
  • Oh, one other comment is - do you really have to generate these? As in, can't you use an existing generator to make all digraphs, and filter out those that match your conditions? What do these graph represent, anyway (not important, just curious..)? – gilleain Nov 01 '17 at 17:35
  • @gillean re: "can't you use..." I'm new to this, what would you recommend? re: "what do they represent", sort of like neural nets, but constrained by a binary transition function, eg 2-arity boolean logic gates. I'm applying some ideas from Integrated Information Theory (and my own) to these constructs, just for fun really, trying to explore the space where I consider "general" AI probably lives. – Josh.F Nov 01 '17 at 17:53

1 Answers1

1

There are a number of approaches you could try. One thing that I do note is that having loops with multi-edges (colored loops?) is a little unusual, but is probably just needs a refinement of existing techniques.

Filter the output of another program

The obvious candidate here is of course nAUTy/traces (http://pallini.di.uniroma1.it/) or similar (saucy, bliss, etc). Depending on how you want to do this, it could be as simple as run nauty (for example) and output to file, then read in the list filtering as you go.

For larger values of n this could start to be a problem if you are generating huge files. I'm not sure whether you start to run out of space before you run out of time, but still. What might be better is to generate and test them as you go, throwing away candidates. For your purposes, there may be an existing library for generation - I found this one but I have no idea how good it is.

Use graph invariants

A very easy first step to more efficient listing of graphs is to filter using graph invariants. An obvious one would be degree sequence (the ordered list of degrees of the graph). Others include the number of cycles, the girth, and so on. For your purposes, there might be some indegree/outdegree sequence you could use.

The basic idea is to use the invariant as a filter to avoid expensive checks for isomorphism. You can store the (list of ) invariants for already generated graphs, and check the new one against the list first. The canonical form of a structure is a kind of invariant.

Implement an algorithm

There are lost of GI algorithms, including the ones used by nauty and friends. However, they do tend to be quite hard! The description given in this answer is an excellent overview, but the devil is in the details of course.

Also note that the description is for general graphs, while you have a specific subclass of graph that might be easier to generate. There may be papers out there for digraph listing (generating) but I have not checked.

gilleain
  • 621
  • 3
  • 11
  • 24