0

Good night Stack Overflow!

Tonight I'm trying to remove the "header" from an XML I've parsed as a string and use replaceAll to remove the following:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

From the string. Since it's going to be concatenated with another XML String, and leaving it would leave two of those.

So I tried:

// getXML already has my XML.
getXML = getXML.replaceAll("<?xml version="1.0" encoding="UTF-8" standalone="no"?>", "");

This fails to compile, due to the "" inside of the String. I then tried with escape sequences:

String headerXMLString = ("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>");
getXML = getXML.replaceAll(headerXMLString, "");

This fails as well, While the program itself runs I assume due to the escapes (\) it doesn't delete the string

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

Since the String is technically not the same.

How would one work around this? Any and all help is greatly appreciated.

Erick
  • 2,488
  • 6
  • 29
  • 43
  • Please check http://stackoverflow.com/questions/60160/how-to-escape-text-for-regular-expression-in-java. – Smutje May 08 '14 at 06:19

3 Answers3

4

Don't use replaceAll(), which does a regex search.
Instead use replace(), which uses plain-text search.

getXML = getXML.replace(headerXMLString, "");

Note that despite the unfortunate name difference, replace() still replaces all occurrences found.


A better approach would be to use regex to match the XML header no matter what it contains:

getXML = getXML("^<?xml.*?\\?>", "");

This would also do nothing if there was no header.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
3

you can use replace() instead replaceAll() following works for me

String s = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>";
String s2 = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>";
s2 = s2.replace(s, "");
System.out.println(s2);

OP:

<blank>

EDIT:

how about following?

String s = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>";
Scanner sc = new Scanner(new File("D:\\temp.txt"));
String s2 = sc.nextLine();
System.out.println("b4 "+s2);
s2 = s2.replaceAll(s, "");
System.out.println("aftr "+s2);

File Content :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
dev2d
  • 4,245
  • 3
  • 31
  • 54
  • This is not applicable in my case because the String that has the line doesn't have escape sequences. You are deleting identical strings. – Erick May 08 '14 at 06:28
  • i told you the same thing :) did not i?? – dev2d May 08 '14 at 06:43
1

If you want to use a literal pattern either use Pattern.Quote or \Q ... \E:

Pattern.quote("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>")

http://regex101.com/r/cF3aI1

Working Java example showing both methods:

https://ideone.com/mZwwOs

l'L'l
  • 44,951
  • 10
  • 95
  • 146