3

I'm facing an issue with removing whitespaces within the value fields in the xml data.

eg:

Input

<?xml version="1.0"?>
<ns:myOrder xmlns:ns="http://w3schools.com/BusinessDocument" xmlns:ct="http://something.com/CommonTypes">
  <MessageHeader>
     <ct:ID>i7                           </ct:ID>
     <ct:ID>i7                           </ct:ID>
     <ct:ID>i7                           </ct:ID>
     <ct:ID>i7                           </ct:ID>
     <ct:Name> Company Name           </ct:Name>
 </MessageHeader>
</ns:myOrder>

Expected output:

<?xml version="1.0"?>
  <ns:myOrder xmlns:ns="http://w3schools.com/BusinessDocument" xmlns:ct="http://something.com/CommonTypes">
    <MessageHeader>
       <ct:ID>i7</ct:ID>
       <ct:ID>i7</ct:ID>
       <ct:ID>i7</ct:ID>
       <ct:ID>i7</ct:ID>
       <ct:Name>Company Name</ct:Name>
    </MessageHeader>
  </ns:myOrder>

I tried with the below code

public static String getTrimmedXML(String rawXMLFilename) throws Exception
     {
          BufferedReader in = new BufferedReader(new FileReader(rawXMLFilename));
     String str;
     String trimmedXML = null;     
     while ((str = in.readLine()) != null) 
     {
          String str1 = str;
          if (str1.length()>0) 
          {
               str1 = str1.trim();
               if(str1.charAt(str1.length()-1) == '>')
               {
                    trimmedXML = trimmedXML + str.trim();
               }
               else
               {
                    trimmedXML = trimmedXML + str;
               }
          }
     }     
     in.close();
     return trimmedXML.substring(4);
     }

I'm unable to remove those spaces. Please let me know where i'm going wrong

Regards, Monish

shockwave
  • 3,074
  • 9
  • 35
  • 60
  • 2
    `trim` only remove spaces at the start and the end of a string (in your case a line). Try to parse the XML remove the spaces and rewrite the XML. – Jens Dec 17 '14 at 07:48
  • The xpath function `normalize-space` will do this trimming. You can use a [modified `Identity transform`](http://en.wikipedia.org/wiki/Identity_transform) to do this in XSL. – StuartLC Dec 17 '14 at 07:51
  • If one of the answers helped you solve the issue, you can check the checkmark on it. This also gives you some additional reputation. – Thomas Weller Feb 19 '15 at 13:35

5 Answers5

2

You might not want to use replace or replace all because then it will replace all whitespace in your xml data. If you want to trim start/end of xml content, either you want to parse the whole xml or using xpath and transform it back to string. Use below's code.

public static String getTrimmedXML(String rawXMLFilename, String tagName) throws Exception {
    // Create xml document object
    BufferedReader in = new BufferedReader(new FileReader(rawXMLFilename));
    InputSource source = new InputSource(in);
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document document = db.parse(source);
    XPathFactory xpathFactory = XPathFactory.newInstance();
    XPath xpath = xpathFactory.newXPath();

    // Path to the node that you want to trim
    NodeList nodeList = (NodeList) xpath.compile("//*[name()='" + tagName + "']").evaluate(document, XPathConstants.NODESET);
    for (int index = 0; index < nodeList.getLength(); index++) { // Loop through all nodes that match the xpath
        Node node = nodeList.item(index);
        String newTextContent = node.getTextContent().trim(); // Actual trim process
        node.setTextContent(newTextContent);
    }

    // Transform back the document to string format.
    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer transformer = tf.newTransformer();
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    StringWriter writer = new StringWriter();
    transformer.transform(new DOMSource(document), new StreamResult(writer));
    String output = writer.getBuffer().toString().replaceAll("\n|\r", "");
    return output;
}
khairul.ikhwan
  • 527
  • 1
  • 5
  • 13
1

Below is the code that does white space removal in vtd-xml.

import com.ximpleware.*;
public class removeWS {

    public static void main(String[] s) throws VTDException, Exception{
        VTDGen vg = new VTDGen();
        AutoPilot ap = new AutoPilot();
        XMLModifier xm = new XMLModifier();
        if (vg.parseFile("d:\\xml2\\ws.xml", true)){
            VTDNav vn = vg.getNav();
            ap.bind(vn);
            xm.bind(vn);
            ap.selectXPath("//text()");
            int i=-1;
            while((i=ap.evalXPath())!=-1){
                int offset = vn.getTokenOffset(i);
                int len = vn.getTokenLength(i);

                long l = vn.trimWhiteSpaces((((long)len)<<32)|offset );
                System.out.println(" ===> "+vn.toString(i));
                System.out.println("len ==>"+len+" new len==>"+ (l>>32));
                int nlen = (int)(l>>32);
                int nos= (int) l;
                xm.updateToken(i,vn,nos,nlen);
            }
            xm.output("d:\\xml2\\new.xml");

        }
    }
}
vtd-xml-author
  • 3,319
  • 4
  • 22
  • 30
0

IMHO you should use a XML library, then probably select the affected Nodes via XPath, and then

String value = node.getTextContent();
node.setTextContent(value.trim());
Community
  • 1
  • 1
Thomas Weller
  • 55,411
  • 20
  • 125
  • 222
0

Removing all whitespaces in a string can be done with the String class's replace method like so:

String str = " random    message withlots   of white  spaces     ";
str = str.replace(" ", "");
System.out.println(str);

The above will run to print str without any whitespaces. The replace method takes 2 arguments- the first is the String you want the method to replace by the second argument- which is another String. This method's arguments are not limited to single-character Strings either.

Woodrow
  • 136
  • 2
  • 10
-3

Use replaceAll method in java

for Example

String s1 = "<ct:ID>i7                           </ct:ID>";
System.out.println(s1.replaceAll(" ","").trim());
Michaël
  • 3,679
  • 7
  • 39
  • 64
sankar
  • 161
  • 13
  • 2
    That would remove spaces inside the XML, so if a tag read something like ` 03/15/2017 `, that would turn the attribute "Version Date" into "VersionDate". A correct solution will not alter the XML. – Jim Jarrett Aug 11 '17 at 20:51