0

I have an XML instance that contains processing instructions. I want a specific one (the schematron declaration):

<?xml-model href="../../a/b/c.sch" schematypens="http://purl.oclc.org/dsdl/schematron"?>

There may or may not be more than these very processing instructions present, so I can't rely on its position in the DOM; it is guaranteed, on the other hand, that there will be only one (or none) such Schematron file reference. Thus, I get it like so:

XProcessingInstruction p = d.Nodes().OfType<XProcessingInstruction>()
   .Where(x => x.Target.Equals("xml-model") && 
    x.Data.Contains("schematypens=\"http://purl.oclc.org/dsdl/schematron\""))
   .FirstOrDefault();

In the example given, the content of p.Data is the string

href="../../a/b/c.sch" schematypens="http://purl.oclc.org/dsdl/schematron"

I need to extract the path specified via @href (i. e. in this example I would want the string ../../a/b/c.sch) without double quotes. In other words: I need the substring after href=" and before the next ". I'm trying to achieve my goal with LINQ:

var a = p.Data.Split(' ').Where(s => s.StartsWith("href=\""))
       .Select(s => s.Substring("href=\"".Length))
       .Select(s => s.TakeWhile(c => c != '"'));

I would have thought this gave me a IEnumerable<char> which I could then convert to a string in one of the ways described here, but that's not the case: According to LINQPad, I seem to be getting a IEnumerabale<IEnumerable<char>> which I can't manage to make into a string.

How could this be done correctly using LINQ? Maybe I'd better be using Regex within LINQ?


Edit: After typing this down, I came up with a working solution, but it seems very inelegant:

string a = new string
   (
      p.Data.Substring(p.Data.IndexOf("href=\"") + "href=\"".Length)
      .TakeWhile(c => c != '"').ToArray()
   );

What would be a better way?

Philipp Koch
  • 69
  • 10
  • 1
    *I would have thought this gave me a IEnumerable* - it would if there were only one of them, but Where emits multiple strings, and the ultimate Select turns them into multiple character enumerations – Caius Jard Apr 15 '22 at 10:06

1 Answers1

1

Try this:

var input = @"<?xml-model href=""../../a/b/c.sch"" schematypens=""http://purl.oclc.org/dsdl/schematron""?>";
var match = Regex.Match(input, @"href=""(.*?)""");
var url = match.Groups[1].Value;

That gives me ../../a/b/c.sch in url.

Please don't use Regex for general XML parsing, but for this situation it's fine.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • I somehow thought - as I was using LINQ anyway - there should be an "entirely LINQ"-way for no reason; yours is by far more readable and simple. Thank you. And yes, I, too, would never process XML with regex, I'd either use LINQ while in C# or, preferably, simply XSLT/XQuery if possible). – Philipp Koch Apr 15 '22 at 20:38