5

I'm trying to read a GraphML file containing a single directed Graph into a Haskell Data.Graph in order to run an analysis using the Math.Combinatorics.Graph module.

However, I can't find any module that allows me to read a GraphML file, producing a Data.Graph. One related module I found is ForSyDe.Backend.GraphML. However, this seems to be specific to the ForSyDe DSL and I currently can't think of a way to use it to read a plain Data.Graph.

Could you point me to a library allowing me to read GraphML, preferably with some example code on how to use it?

Uli Köhler
  • 13,012
  • 16
  • 70
  • 120
  • 2
    GraphML is just XML so why not use one of the many XML parsers for haskell (hxt, Haxml)? A quick google search doesn't turn up anything for haskell GraphML parsers. – user2407038 Jan 10 '14 at 03:07
  • @user2407038 Thanks for your suggestion! If there is nothing else to use, I'll either do this or use FFI to include any existing C/C++ GraphML library, but my main problem with parsing it manually (as XML) is that it will only implement a small subset of GraphML and therefore might be unusable besides in some very specific usecases. – Uli Köhler Jan 10 '14 at 03:24
  • 2
    Why would parsing the XML only implement a small subset? I checked the specification and it seems like a very simple language. – user2407038 Jan 10 '14 at 06:07
  • @user2407038 I have had the experience that most programs don't strictly obey the GraphML standard, some libraries are incompatible with each other. Besides that, features like Hypergraphs won't be as easy to implement correctly, especially without any usable unit tests. Of course, if there's really no such library, that would be the best option. – Uli Köhler Jan 11 '14 at 13:48
  • 1
    Well, if some software doesn't obey the standard, there isn't much you can do. I don't know your use scenario but unless your goal is to create a haskell graphML parser, it would probably make sense to only support which parts you will need. Hypergraphs don't seem that hard, considering there is a hypergraph module in HaskellForMaths. Probably your best bet is to write a bare-minimum parser and go from there, adding features as you need them. – user2407038 Jan 11 '14 at 20:00
  • @user2407038 Thanks, I completely agree with you that this would be the best (if not the only) approach. I'd prefer not to do it (if there would an existing library for this purpose) but unfortunately that doesn't seem to be the case. – Uli Köhler Jan 11 '14 at 20:08
  • @user2407038 Thanks to your suggestions, I managed to put together a minimal GraphML parser that works for some test graphs. – Uli Köhler Jan 21 '14 at 00:33

1 Answers1

4

After more than a week of searching, I assume there is currently no GraphML parser library in existence. Therefore I wrote my own minimal parser.

Let's assume we have this GraphML:

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"  
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
     http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
  <graph id="G" edgedefault="undirected">
    <node id="n0"/>
    <node id="n1"/>
    <node id="n2"/>
    <node id="n3"/>
    <edge id="e1" source="n0" target="n1"/>
    <edge id="e1" source="n1" target="n2"/>
    <edge id="e1" source="n1" target="n3"/>
    <edge id="e1" source="n3" target="n0"/>
  </graph>
</graphml>

I created this HXT-based parser that's able to parse a minimal subset of GraphML (just enough to create a Data.Graph of the above GraphML). The main function of the following file represents an example of how to use it: It prints the list of nodes in the graph (also see this related question ).

{-# LANGUAGE Arrows, NoMonomorphismRestriction #-}
import Text.XML.HXT.Core
import qualified Data.Graph as DataGraph

data Graph = Graph
  { graphId :: String,
    nodes :: [String],
    edges :: [(String, String)] -- (Source, target)
  }
  deriving (Show, Eq)

atTag tag = deep (isElem >>> hasName tag)

parseEdges = atTag "edge" >>>
  proc e -> do
      source <- getAttrValue "source" -< e
      target <- getAttrValue "target" -< e
      returnA -< (source, target)

parseNodes = atTag "node" >>>
  proc n -> do
      nodeId <- getAttrValue "id" -< n
      returnA -< nodeId

parseGraph = atTag "graph" >>>
  proc g -> do
      graphId <- getAttrValue "id" -< g
      nodes <- listA parseNodes -< g
      edges <- listA parseEdges -< g
      returnA -< Graph{graphId=graphId, nodes=nodes, edges=edges}

getEdges = atTag "edge" >>> getAttrValue "source"

-- Get targets for a single node in a Graph
getTargets :: String -> Graph -> [String]
getTargets source graph = map snd $ filter ((==source).fst) $ edges graph

-- Convert a graph node into a Data.Graph-usable
getDataGraphNode :: Graph -> String -> (String, String, [String])
getDataGraphNode graph node = (node, node, getTargets node graph)

-- Convert a Graph instance into a Data.Graph list of (node, nodeid, edge) tuples
getDataGraphNodeList :: Graph -> [(String, String, [String])]
getDataGraphNodeList graph = map (getDataGraphNode graph) (nodes graph)

main :: IO()
main = do
    graphs <- runX (readDocument [withValidate no] "foo.graphml" >>> parseGraph)
    --  Convert Graph structure to Data.Graph-importable tuple list
    let graphEdges = getDataGraphNodeList $ head graphs
    -- Convert to a Data.Graph
    let (graph, vertexMap) = DataGraph.graphFromEdges' graphEdges
    -- Example of what to do with the Graph: Print vertices
    print $ map ((\ (vid, _, _) -> vid) . vertexMap) (DataGraph.vertices graph)
Uli Köhler
  • 13,012
  • 16
  • 70
  • 120