2

I would like to pick out the positions and font sizes of each of the symbols used in an SVG of a mathematical equation.

I am playing around with the Python XML parsing library: xml.etree.ElementTree (https://docs.python.org/3/library/xml.etree.elementtree.html).

Here is the example SVG I am using:

example_svg = '''<svg style="vertical-align:-10.2252022445128pt" xmlns="http://www.w3.org/2000/svg" width="193pt" height="31pt" viewBox="-1 -1 193 31">
<path d="M43.875 16.305h20.426" fill="none" stroke-width=".914" stroke="#000" stroke-miterlimit="10"></path>
<g font-family="MathFont" font-size="13.5">
<text y="11.168" x="45.874">3</text>
<text y="11.168" x="52.532"></text> 
<text y="28.382" x="50.758">4</text>
</g>
<g font-family="MathFont" font-size="9.45">
<text y="6.327" x="60.453">3</text></g>
</svg>'''    

In Latex the equation is $\frac{3x^3}{4}$.

Using the following code gives me nearly everything I want, but I can’t seem to connect this to the attributes in group text. Ideally I want the output to be (symbol, y_coord, x_coord, font-family, font-size).

import xml.etree.ElementTree as ET

root = ET.fromstring(example_svg)

for tag in root.findall('.//{http://www.w3.org/2000/svg}text'):
  symbol = tag.text
  y_coord = tag.get('y')
  x_coord = tag.get('x')
  print(symbol, y_coord, x_coord)
python1729
  • 23
  • 4
  • 1
    Font and size are in another tag, not in ``. So that's why you don't see them. Does `etree` allow access to the *parent* of an element? – Jongware Feb 16 '20 at 14:35
  • 1
    @usr2564301 this is a helpful comment, it led me to https://stackoverflow.com/questions/2170610/access-elementtree-node-parent-node I am just following it through to see if it solves the problem. – python1729 Feb 16 '20 at 14:47
  • @usr2564301 Thanks for the help. I think everything follows from this: for parent in root.getiterator(): for child in parent: print(child.tag, child.attrib, parent.tag, parent.attrib) – python1729 Feb 16 '20 at 14:51
  • Another option may be more 'natural' to XML: loop over all `` elements, and inside them, loop over their `` children. – Jongware Feb 16 '20 at 14:53

1 Answers1

1

The font family name and size are not specified in <text> elements but in their parent <g> groups. Some care is necessary, because multiple <g> elements may appear nested in each other.

You can locate all <g> elements with find_all and temporarily store the font parameters, but if you try to do so to first find <g> and then its <text>s, you will soon notice it cannot properly handle nested groups. Every occurrence of <g> will trigger looking for its contained <text> elements, and so if there is a set of nested

<g group parameters> <g text parameters> <text>your text</text> </g> </g>

then it will report all <text> elements twice: once for each (nested or not) <g>.

A better way, then, is to iterate over the entire XML file and storing the font information as and when you come along it.

root = ET.fromstring(example_svg)

font = None
font_size = None
for elem in root.iter():
    if elem.tag == '{http://www.w3.org/2000/svg}g':
        item = elem.get('font-family')
        if item is not None:
            font = item
        item = elem.get('font-size')
        if item is not None:
            font_size = item

    elif elem.tag == '{http://www.w3.org/2000/svg}text':
        symbol = elem.text
        y_coord = elem.get('y')
        x_coord = elem.get('x')
        print (symbol, y_coord, x_coord, font, font_size)

(Note that iter() needs the full SVG namespace prepended to each element.)

Result:

3 11.168 45.874 MathFont 13.5
 11.168 52.532 MathFont 13.5
4 28.382 50.758 MathFont 13.5
3 6.327 60.453 MathFont 9.45
Jongware
  • 22,200
  • 8
  • 54
  • 100