How to parse a DOT file in Python

Question

I have a transducer saved in the form of a DOT file. I can see a graphical representation of the graphs using gvedit, but what if I want to convert the DOT file to an executable transducer, so that I can test the transducer and see what strings it accepts and what it doesn't.

In most of the tools I have seen in Openfst, Graphviz, and their Python extensions, DOT files are only used to create a graphical representation, but what if I want to parse the file to get an interactive program where I can test the strings against the transducer?

Are there any libraries out there that would do the task or should I just write it from scratch?

As I said, the DOT file is related to a transducer I have designed that simulates morphology of English. It is a huge file, but just to give you an idea of how it is like, I provide a sample. Let's say I want to create a transducer that would model the behavior of English with regards to Nouns and in terms of plurality. My lexicon consists of only three words (book, boy, girl). My transducer in this case would look something like this:

enter image description here

which is directly constructed from this DOT file:

digraph A {
rankdir = LR;
node [shape=circle,style=filled] 0
node [shape=circle,style=filled] 1
node [shape=circle,style=filled] 2
node [shape=circle,style=filled] 3
node [shape=circle,style=filled] 4
node [shape=circle,style=filled] 5
node [shape=circle,style=filled] 6
node [shape=circle,style=filled] 7
node [shape=circle,style=filled] 8
node [shape=circle,style=filled] 9
node [shape=doublecircle,style=filled] 10
0 -> 4 [label="g "];
0 -> 1 [label="b "];
1 -> 2 [label="o "];
2 -> 7 [label="y "];
2 -> 3 [label="o "];
3 -> 7 [label="k "];
4 -> 5 [label="i "];
5 -> 6 [label="r "];
6 -> 7 [label="l "];
7 -> 9 [label="<+N:s> "];
7 -> 8 [label="<+N:0> "];
8 -> 10 [label="<+Sg:0> "];
9 -> 10 [label="<+Pl:0> "];
}

Now testing this transducer against the words means that if you feed it with book+Pl it should spit back books and vice versa. I'd like to see how it is possible to turn the dot file into a format that would allow such analysis and testing.

A DOT file represents a graph which consists of nodes and edges. I guess that nodes are input or output point, and edge between two nodes represents transportation. If you show the .dot file, you may be able to get more useful comment and/or answer. — Fumu 7, Feb 04 '15 at 05:26

score 6 · Answer 1 · edited Jul 01 '20 at 10:15

6

Install the graphviz library. Then try the following:

import graphviz
graphviz.Source.from_file('graph4.dot')

edited Jul 01 '20 at 10:15

Somo S.

3,997
4
26
33

answered Apr 03 '16 at 17:31

Geovanny

101
1
2

11

This is not really parsing the DOT file. – Juan Leni Mar 17 '17 at 17:49
You're right, it isn't parsing the file into a useful structure like the OP asked. However, it is enough to render the graph (in Spyder), which solved my problem! – Leo Jan 03 '19 at 07:10
If I do that, technically I'm parsing the file with Python, now I can dump it in other formats. So the answer is valid, OP was not requesting to avoid using third party libraries. – Ariel M. Jan 22 '21 at 01:00

score 3 · Answer 2 · answered Feb 04 '15 at 05:57

3

You could start by loading the file using https://code.google.com/p/pydot/ . From there it should be relatively simply to write the code to traverse the in-memory graph according to an input string.

answered Feb 04 '15 at 05:57

John Zwinck

239,568
38
324
436

1

Could you elaborate on that a bit more? I know about pydot and I know that you can load a dot file in there. The `dot_parser` in pydot converts the dot file into some internal class representation. But I am not sure how I can use that. Pydot is basically an interface for Graphviz afaik. – Morteza R Feb 04 '15 at 06:17
2

@schmutter: see here: http://stackoverflow.com/a/22935664/4323 - you can load the edges. If you want a more full-featured graph library, see https://code.google.com/p/python-graph/ which can also load Dot files, and has algorithms included. – John Zwinck Feb 04 '15 at 06:22
I'm not able to use (the current version) of pydot; it says it requires pyparsing. I downloaded the latest version of pyparsing, but pydot tried to import something from pyparsing that doesn't exist. Grr >:( – allyourcode Feb 08 '16 at 22:56

jeff wang · Answer 3 · 2019-04-30T20:41:19.053

3

Use this to load a .dot file in python:

graph = pydot.graph_from_dot_file(apath)

# SHOW as an image
import tempfile, Image
fout = tempfile.NamedTemporaryFile(suffix=".png")
graph.write(fout.name,format="png")
Image.open(fout.name).show()

edited Apr 30 '19 at 20:41

answered Apr 30 '19 at 20:32

jeff wang

149
1
3

score 3 · Answer 4 · answered Jul 01 '20 at 02:23

3

Another path, and a simple way of finding cycles in a dot file:

import pygraphviz as pgv
import networkx as nx

gv = pgv.AGraph('my.dot', strict=False, directed=True)
G = nx.DiGraph(gv)

cycles = nx.simple_cycles(G)
for cycle in cycles:
    print(cycle)

answered Jul 01 '20 at 02:23

dsz

4,542
39
35

Looks good, but cannot be installed at the moment. `pip install pygraphviz` fails as well as `pip3 install pygraphviz`. – JohnnyFromBF Sep 22 '20 at 10:29
@JohnnyFromBF - on what platform? I have it working on Mac & Linux, but have had issues on Windows configurations (using Anaconda). – dsz Sep 23 '20 at 08:15
I'm on latest Debian Buster. – JohnnyFromBF Sep 23 '20 at 08:50
@JohnnyFromBF - from memory, you'll need to install both Graphviz, and something like `libgraphviz-dev`, to get the build prerequisites. If that doesn't work, please post the error you're seeing. – dsz Sep 24 '20 at 03:58

score 0 · Answer 5 · answered Jan 03 '19 at 07:06

Guillaume's answer is sufficient to render the graph in Spyder (3.3.2), which might solve some folks problems.

If you really need to manipulate the graph, as the OP needs to, it will be a bit complex. Part of the problem is that Graphviz is a graph rendering library, while you are trying to analyse the graph. What you are trying to do is similar to reverse engineering a Word or LateX document from a PDF file.

If you can assume the nice structure of the OP's example, then regular expressions work. An aphorism I like is that if you solve a problem with regular expressions, now you have two problems. Nonetheless, that might just be the most practical thing to do for these cases.

Here are expressions to capture:

your node information: r"node.*?=(\w+).*?\s(\d+)". The capture groups are the kind and the node label.
your edge information: r"(\d+).*?(\d+).*?\"(.+?)\s". The capture groups are source, sink, and the edge label.

To try them out easily see https://regex101.com/r/3UKKwV/1/ and https://regex101.com/r/Hgctkp/2/.

Well, no, it isn't exactly like trying to reverse engineer a PDF file. At least not into a Word or latex file. Here we want to construct an internal computer representation, a parse tree, from the file. This very operation is performed by the graphviz program before generating its output. — JohanL, Jan 16 '19 at 13:29

score 0 · Answer 6 · answered Apr 06 '20 at 17:34

0

I haven’t tried it yet with the sample above, but NetworkX has a read_dot function that might have been a good way to solve this by converting the file into a graph object with good abilities to then analyze and test the graph.

answered Apr 06 '20 at 17:34

Alnilam

3,121
2
21
22

How to parse a DOT file in Python

6 Answers6

Linked