2

Our application is at the receiving-end to do retro-analysis of XML data. Our application doesn't have Java or .NET available, but runs in Unix, so it has awk and Perl.

The XML messages in the file contains:

<?xml version="1.0" encoding="ISO-8859-1" ?> 

I tried a few options in Perl and awk to get them removed, but couldn't get these to work:

perl -p -i -e "s/<?xml version="1.0" encoding="ISO-8859-1" ?>//g"  inputFile
perl -p -i -e "s/<?xml version=\"1.0\" encoding=\"ISO-8859-1\" ?>//g"  inputFile
perl -p -i -e "s/<\?xml version="1.0" encoding="ISO-8859-1" \?>//g"  inputFile

Any other option to do this using PERL or AWK?

diaryfolio
  • 605
  • 10
  • 24
  • 1
    Are you using an XML parser when "receiving" the XML? The XML declaration is useful for a parser, and trying to process any significant XML with anything but a parser will lead to madness. And, if you're on Unix, you have all sorts of languages available to you, if you install them. – the Tin Man Oct 09 '12 at 14:15
  • Take a look at "[How can I mine an XML document with awk, Perl, or Python?](http://stackoverflow.com/a/909076/128421)" for a related answer. – the Tin Man Oct 09 '12 at 14:21
  • Your Perl code isn't working because `?` is a regular expression metacharacter. Replace the `?` with `.` or `\?` in each case and you should be OK. – Jonathan Leffler Oct 09 '12 at 14:22
  • @JonathanLeffler: You are right. But i tried perl -p -i -e "s/<\?xml version="1.0" encoding="ISO-8859-1" \?>//g" inputFile , but still not working. I will add into the main question – diaryfolio Oct 09 '12 at 14:40
  • Oops - I did not notice that you'd enclosed the `-e` in double quotes; I'd automatically use single quotes. This worked for me: `perl -p -e 's/<\?xml version="1.0" encoding="ISO-8859-1" \?>//g'`. – Jonathan Leffler Oct 09 '12 at 15:06

2 Answers2

1

You don't have to match the whole string if your file is XML. <?xml version is enough.

Try:

sed -i '/<\?xml version/d' file

test

kent$  echo '<?xml version="1.0" encoding="ISO-8859-1" ?> 
foo
bar
xyz
hello
there'|sed '/<\?xml version/d' 
foo
bar
xyz
hello
there
Bart
  • 19,692
  • 7
  • 68
  • 77
Kent
  • 189,393
  • 32
  • 233
  • 301
  • @above, it didn't work. "sed: illegal option -- i". I tried without "-i" option, but it truncated the message incorrectly. – diaryfolio Oct 09 '12 at 14:36
  • @diaryfolio see the test in answer, it did give what you want, didn't it? – Kent Oct 09 '12 at 14:40
  • `$ cat inputFile.xml Monday` `$ cat inputFile.xml | sed '/<\?xml version/d'` `$` Seems like its truncating whole of the message – diaryfolio Oct 09 '12 at 14:47
1

This worked for me without overwriting the data file:

perl -p -e 's/<\?xml version="1.0" encoding="ISO-8859-1" \?>//g'

I'd only overwrite the file (-i) when I was sure I'd got the basic regex working without doing damage.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278