0

I have this line from an XML document:

<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" contentScriptType="text/ecmascript" width="1024" zoomAndPan="magnify" contentStyleType="text/css" viewBox="0 0 1024 768" height="768" preserveAspectRatio="xMidYMid meet" version="1.0">

I want to be able to split it up, using the split method. For example i want to save each parameter into a String array.

So i'd like:

contentScriptType="text/ecmascript" 
width="1024" 
zoomAndPan="magnify" 
contentStyleType="text/css" 
viewBox="0 0 1024 768" 
height="768"

etc etc to be saved into a string array, is there anyway to do this using the split method, or can anybody suggest an easier, more efficient way to do this?

Here is the scary looking regular expression:

\s(.*?)\s?=(?:(?:\\[,"']|[^,"'])+|"(?:\\"|[^"])*(?<!\\)"|'[^']*'|)

Eclipse wont accept this as it has invalid character constants, anybody know how to overcome this error?

Buzz Lightyear
  • 190
  • 2
  • 19
  • You could map that into an object using [Xstream](http://xstream.codehaus.org/) (that's what I would do). Xstream is quite simple and doesn't require almost any configuration. – Augusto Aug 30 '12 at 09:36
  • 2
    Why don't you use a real XML parser? There are so many corner-cases that doing it correctly with `split()` will be a major pain in the but. – Joachim Sauer Aug 30 '12 at 09:37
  • You should use [an XML parser](http://stackoverflow.com/questions/373833/best-xml-parser-for-java) instead of splitting. – assylias Aug 30 '12 at 09:37
  • Ideally, i would just split it up into a string array, saving each parameter as a string in there. – Buzz Lightyear Aug 30 '12 at 09:40

3 Answers3

3

Read it with DOM or SAX, process the attributes and add it to a map.

Michael-O
  • 18,123
  • 6
  • 55
  • 121
2

There are multiple ways to represent the same XML document (see below), differences in white space and quotes can make it difficult to write (and maintain) a regular expression.

input.xml (representation 1)

<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" contentScriptType="text/ecmascript" width="1024" zoomAndPan="magnify" contentStyleType="text/css" viewBox="0 0 1024 768" height="768" preserveAspectRatio="xMidYMid meet" version="1.0">

input.xml (representation 2)

<?xml version="1.0" encoding="UTF-8"?>
<svg 
     xmlns:xlink = 'http://www.w3.org/1999/xlink'
     xmlns = 'http://www.w3.org/2000/svg' 
     contentScriptType = 'text/ecmascript' 
     width = '1024'
     zoomAndPan = 'magnify'
     contentStyleType = 'text/css'
     viewBox = '0 0 1024 768'
     height = '768'
     preserveAspectRatio = 'xMidYMid meet'
     version = '1.0'>

I would recommend using an XML parser. Below is how it could be done using StAX (JSR-173). An implementation of a StAX parser is included in Java SE 6.

Demo

package forum12193899;

import java.io.StringReader;

import javax.xml.stream.*;
import javax.xml.transform.stream.StreamSource;

public class Demo {

    public static void main(String[] args) throws Exception {
        XMLInputFactory xif = XMLInputFactory.newFactory();
        StreamSource xml = new StreamSource("src/forum12193899/input.xml");

        String xmlString = "<svg xmlns:xlink=\"http://www.w3.org/1999/xlink\" xmlns=\"http://www.w3.org/2000/svg\" contentScriptType=\"text/ecmascript\" width=\"1024\" zoomAndPan=\"magnify\" contentStyleType=\"text/css\" viewBox=\"0 0 1024 768\" height=\"768\" preserveAspectRatio=\"xMidYMid meet\" version=\"1.0\">";
        XMLStreamReader xsr = xif.createXMLStreamReader(new StringReader(xmlString));

        xsr.nextTag(); // Advance to "svg" element.
        int attributeCount = xsr.getAttributeCount();
        String[] array = new String[attributeCount];
        for(int x=0; x<attributeCount; x++) {
            StringBuilder stringBuilder = new StringBuilder();
            array[x]= xsr.getAttributeLocalName(x) + "=\"" + xsr.getAttributeValue(x) + "\"";
        }

        // Output the Array
        for(String string : array) {
            System.out.println(string);
        }
    }

}

Output

contentScriptType="text/ecmascript"
width="1024"
zoomAndPan="magnify"
contentStyleType="text/css"
viewBox="0 0 1024 768"
height="768"
preserveAspectRatio="xMidYMid meet"
version="1.0"
bdoughan
  • 147,609
  • 23
  • 300
  • 400
  • How do i pass the xml into this? I'm doing it by line using 'delta.getOriginal().getLines()' which then goes through a for each loop using an object for each line, can i cast that object (line) to anything? – Buzz Lightyear Aug 30 '12 at 11:08
  • @BuzzLightyear - `XMLInputFactory` can create an ‘XMLStreamReader‘ on many different types of inputs. If you the XML as a `String` the you can leverage a `StringReader` as input. – bdoughan Aug 30 '12 at 11:18
  • @BuzzLightyear - Is your XML currently represented as a `String`? – bdoughan Aug 30 '12 at 11:35
  • @BuzzLightyear - I have updated the demo code to take a `String` as input instead of a `File`. – bdoughan Aug 30 '12 at 12:39
  • Thanks for your help. I get this error though: ParseError at [row,col]:[1,55] Message: Premature end of file. – Buzz Lightyear Aug 30 '12 at 15:19
  • at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.nextTag(Unknown Source) – Buzz Lightyear Aug 30 '12 at 15:19
  • @BuzzLightyear - It is probably due to your XML fragment not being well formed. Since you have only a single element it should be `` or `` and not just ``. Although the parser I'm using doesn't complain. – bdoughan Aug 30 '12 at 15:27
  • @BuzzLightyear: what is the return type of `delta.getOriginal()`? – Joachim Sauer Aug 30 '12 at 17:18
0

If you for some reason don't want to use Sax (which I would suggest too), the reason that Eclipse is rejecting your regular expression is that you have to escape \ in the pattern and " in the String literal. So you pattern string definition should look like:

String regex = "\\s(.*?)\\s?=(?:(?:\\\\[,\"']|[^,\"'])+|\"(?:\\\"|[^\"])*(?<!\\)\"|'[^']*'|)";
Martin Kalina
  • 111
  • 1
  • 7