1

I have a large XML file that has tens of thousands of the same elements:

<rootElem>
    <fizz buzz="true">234</fizz>
    <fizz buzz="false">384</fizz>
    <fizz buzz="true"></fizz>
    <fizz buzz="true">39494</fizz>
    <fizz/>
</rootElem>

I'd like to run a grep that prints out any <fizz> elements that do not contain text/body data (the numbers in between the opening & closing tags). In the example above, the grep would produce 2 lines for the 3rd and 5th <fizz> elements that do not contain the numeric data. The file name is fizzes_20.xml. I tried running the following but to no avail:

  • grep fizzes_20.xml "></>"
  • grep fizzes_20.xml "/>"

Any ideas? Thanks in advance!

  • Don't you get errors like `grep: >>: No such file or directory`? File should be the last argument: `grep [OPTIONS] PATTERN [FILE...]` – Roman Newaza Mar 21 '13 at 01:41
  • Oh wait this is Windows. How do I install grep on Windows 98? –  Mar 21 '13 at 02:03

3 Answers3

3

The xmllint command can be used to implement an xpath expression test for empty nodes:

$ xmllint --xpath "//fizz[not(text())]" data.xml 
<fizz buzz="true"/><fizz/>

Update

$ xmllint --version
xmllint: using libxml version 20901
   compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib Lzma 
Mark O'Connor
  • 76,015
  • 10
  • 139
  • 185
  • The flag was `--pattern` on my machine. Or, at least, `--xpath` wasn't present. I couldn't get this to work so there may have been a difference between the two. – 2rs2ts May 06 '14 at 15:17
  • @2rs2ts Odd... I have included my version of xmllint. – Mark O'Connor May 06 '14 at 17:26
  • Yes, apparently it's a recent addition, and they're (probably) not equivalent: http://stackoverflow.com/questions/91791/grep-and-sed-equivalent-for-xml-command-line-processing/14492020?noredirect=1#comment25192563_14492020 – 2rs2ts May 06 '14 at 17:47
1

It is very easy to accomplish with such pattern:

grep -E '<fizz/>|<fizz.*><' fizzes_20.xml
Roman Newaza
  • 11,405
  • 11
  • 58
  • 89
  • An XML file sometimes only contains a single line (of many MB). Not so useful for the results then... . – hakre Jun 24 '15 at 13:44
0

Try this command:

egrep '<fizz.*(/>|></fizz>)' fizzes_20.xml

The <fizz matches the tag opening and title, the .* matches any attributes, and the last part in parentheses matches either a self-closing tag or a tag with no contents. Hope this helps!