10

I am writing a program in C# which essentially reads an SVG file, and does some useful things with the contents. The most complex data I will be working with are paths. They take forms such as this:

<path d="M5.4,3.806h6.336v43.276h20.738v5.256H5.4V3.806z"/>

In this case, the M, h, v, H, V, and z indicate some commands. In a way they are like functions, with the numbers following them being arguments. There are also some more complex ones:

<path d="M70.491,50.826c-2.232,1.152-6.913,2.304-12.817,2.304c-13.682,0-23.906-8.641-23.906-24.626
        c0-15.266,10.297-25.49,25.346-25.49c5.977,0,9.865,1.296,11.521,2.16l-1.584,5.112C66.747,9.134,63.363,8.27,59.33,8.27
        c-11.377,0-18.938,7.272-18.938,20.018c0,11.953,6.841,19.514,18.578,19.514c3.888,0,7.777-0.792,10.297-2.016L70.491,50.826z"/>

In this case, the "c" command is followed by 6 arguments (-2.232, 1.152, -6.913, 2.304, -12.817, and 2.304 in the first case). You can see how this can get tricky. My questions is this: is the SO community aware of any existing libraries that read such data into some useful ADTs?

Before I go off coding everything and writing a ton of string parsing functions, I'd really like to not re-invent the wheel. Also, any advice would be appreciated. I am aware of how to read an XML document, that isn't the issue here.

Adam S
  • 8,945
  • 17
  • 67
  • 103
  • do you only need to transform each path into a List of commands (where each commands has its own parameters) or do you need to build a basic interpreter of those commands? – Paolo Falabella Feb 25 '11 at 09:18
  • Both will need to happen, and I'm seeking insight on both. – Adam S Feb 25 '11 at 15:15

2 Answers2

13

I don't know of specific libraries in c#, however you could start by parsing this kind of structure like this:

string path = "M5.4,3.806h6.336v43.276h20.738v5.256H5.4V3.806z";
string separators = @"(?=[MZLHVCSQTAmzlhvcsqta])"; // these letters are valid SVG
                             // commands. Whenever we find one, a new command is 
                             // starting. Let's split the string there.
var tokens = Regex.Split(path, separators).Where(t => !string.IsNullOrEmpty(t));

now you have a list of commands followed by their arguments. You could then proceed to split the arguments in the same way.

You said the arguments can be separated by a space, a comma or a minus sign (which,unlike the comma and the whitespace, should remain part of the arguments), so you can use another simple regex (note that I'm no fan of regular expressions, but in this case I think they add to readability).

string argSeparators = @"[\s,]|(?=-)"; // discard whitespace and comma but keep the -
var splitArgs = Regex
   .Split(remainingargs, argSeparators)
   .Where(t => !string.IsNullOrEmpty(t)); 

I would wrap this in a SVGCommand class, like this

class SVGCommand
{
    public char command {get; private set;}
    public float[] arguments {get; private set;}

    public SVGCommand(char command, params float[] arguments)
    {
        this.command=command;
        this.arguments=arguments;
    }

    public static SVGCommand Parse(string SVGpathstring)
    {
        var cmd = SVGpathstring.Take(1).Single();
        string remainingargs = SVGpathstring.Substring(1);

        string argSeparators = @"[\s,]|(?=-)";
        var splitArgs = Regex
            .Split(remainingargs, argSeparators)
            .Where(t => !string.IsNullOrEmpty(t));

        float[] floatArgs = splitArgs.Select(arg => float.Parse(arg)).ToArray();
        return new SVGCommand(cmd,floatArgs);
    }
}

Now a simple "interpreter" could look something like this:

    string path = "M70.491,50.826c-2.232,1.152-6.913,2.304-12.817,2.304c-13.682,0-23.906-8.641-23.906-24.626" +
"c0-15.266,10.297-25.49,25.346-25.49c5.977,0,9.865,1.296,11.521,2.16l-1.584,5.112C66.747,9.134,63.363,8.27,59.33,8.27" +
"c-11.377,0-18.938,7.272-18.938,20.018c0,11.953,6.841,19.514,18.578,19.514c3.888,0,7.777-0.792,10.297-2.016L70.491,50.826z";
    string separators = @"(?=[A-Za-z])";
    var tokens = Regex.Split(path, separators).Where(t => !string.IsNullOrEmpty(t));

    // our "interpreter". Runs the list of commands and does something for each of them.
    foreach (string token in tokens){
                    // note that Parse could throw an exception
                    // if the path is not correct 
        SVGCommand c = SVGCommand.Parse(token);
        Console.WriteLine("doing something with command {0}", c.command);
    }

If you need to do something more sophisticated, F# is probably better suited for the job (and is interoperable with C#). I'm not suggesting to learn F# just for this specific task, I just thought I'd mention it, in case you are already looking into it for something else.

Paolo Falabella
  • 24,914
  • 3
  • 72
  • 86
  • I edited the original with an example. The trouble with the arguments is that they can use a comma as a delimiter, or a space, or a minus sign. In the case of the minus sign, it's not a "true delimiter" as it's also part of the argument itself. – Adam S Feb 25 '11 at 15:17
  • @Adam I expanded the answer based on your clarifications. Hope it helps – Paolo Falabella Feb 26 '11 at 08:07
  • there is another pitfall to it, the code does not handle numbers in scientific notation correctly. for example 1.78e-34 would be split up in two coordinates. any ideas how to catch that case with the regex? – thalm Jul 16 '12 at 18:46
  • 1
    @thalm in the first regex you could use the list of actual SVG commands (which does not include e). i.e. instead of `string separators = @"(?=[A-Za-z])";` you would have something like `string separators = @"(?=[MZLHVCSQTAmzlhvcsqta])";` – Paolo Falabella Jul 17 '12 at 13:36
  • yes, thats for splitting the commands, right. and then to separate the coordinates you should use the regex pattern: @"[\s,]|(?=(?<!e)-)" – thalm Jul 17 '12 at 17:30
  • @thalm if you just want to separate on e you could use a normal string Split, without the regex. Or you could try [parsing the number directly in exponential notation](http://stackoverflow.com/questions/3879463/c-sharp-parse-a-number-from-exponential-notation) – Paolo Falabella Jul 19 '12 at 06:24
  • no, its the opposit. you have to split on - and keep the - when its a negative number. when it follows an e you DON'T want to split, but keep the string together. the float.Parse can handle scientific notation. – thalm Jul 19 '12 at 13:29
  • 1
    I wrote a simple parser in C# a few years ago that copes with SVG paths. I found that 'normalising' the path as a first step made it a bit easier - replace space with "," - replace "-" with ",-", replace ",," with "," and do a bit of trimming – geedubb Feb 12 '14 at 21:11
5

It would be possible to do this using the WPF Geometry object. As far as I can tell the Path Markup syntax used by WPF is the same syntax as SVG path.

var data = "M5.4,3.806h6.336v43.276h20.738v5.256H5.4V3.806z";

var geometry = Geometry.Parse(data);

var pathGeometry = PathGeometry.CreateFromGeometry(geometry);

foreach (var figure in pathGeometry.Figures)
{
    // Do something interesting with each path figure.
    foreach (var segment in figure.Segments)
    {
        // Do something interesting with each segment.
    }
}
bstoney
  • 6,594
  • 5
  • 44
  • 51