0

how could I grep exact word's(strings) from xml file. This is the part of xml file (input file):

 <Sector sectorNumber="1">
    <Cell cellNumber="1" cellIdentity="42901" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
    <Cell cellNumber="2" cellIdentity="42905" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
  </Sector>
  <Sector sectorNumber="2">
    <Cell cellNumber="1" cellIdentity="42902" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
    <Cell cellNumber="2" cellIdentity="42906" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
  </Sector>
  <Sector sectorNumber="3">
    <Cell cellNumber="1" cellIdentity="42903" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
    <Cell cellNumber="2" cellIdentity="42907" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />   
  </Sector>

I want to grep all cellIdentity="...", so bascily it should look like this

cellIdentity="42901"
cellIdentity="42905"
cellIdentity="42902"
cellIdentity="42906"
cellIdentity="42903"
cellIdentity="42907"

when I tried with grep -E "cellIdentity=" input.xml I get whole sentence (line), but I need only as above...

user3319356
  • 173
  • 3
  • 14
  • 3
    [Don't use regex to parse structured data](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not)! – tripleee May 11 '14 at 18:20

4 Answers4

2

Use the -o option of grep to get only the matched pattern. With your example in a file named t.txt:

grep -o 'cellIdentity="[0-9]*"' t.txt 
cellIdentity="42901"
cellIdentity="42905"
cellIdentity="42902"
cellIdentity="42906"
cellIdentity="42903"
cellIdentity="42907"
damienfrancois
  • 52,978
  • 9
  • 96
  • 110
2
Jordan@workstation:~$ egrep -o "cellIdentity=\"[0-9]{5}\"" ddff 
cellIdentity="42901"
cellIdentity="42905"
cellIdentity="42902"
cellIdentity="42906"
cellIdentity="42903"
cellIdentity="42907"

-o only outputs the matching string, and not the entire line.

[0-9]{5} is looking for exactly 5 occurrences of digit.

Rest of the answer contains expected :)

PradyJord
  • 2,136
  • 12
  • 19
  • 1
    This answer would benefit from an explanation of the `-o` flag and the structure of the regex. Adding those would change this from giving an answer into teaching how one goes about solving this problem. – Jason Aller May 11 '14 at 19:02
1

You could use this regular expression:

grep -oP 'cellIdentity="\d*"' file
user000001
  • 32,226
  • 12
  • 81
  • 108
  • Just a heads-up: `-P` (to support Perl-compatible regexes) is not available on all platforms (e.g, OSX). – mklement0 May 11 '14 at 21:49
1

To extract data from XML files, use XML tools:

xmlstarlet sel -t -m "//Cell" -m @cellIdentity -v . -n file.xml

This is far less fragile and handles way more XML files and edge cases than grep.

that other guy
  • 116,971
  • 11
  • 170
  • 194
  • Just to clarify: [`xmlstarlet`](http://xmlstar.sourceforge.net/) is not a standard utility. OSX users: the standard `xmllint` utility could be used: `xmllint --xpath '//Cell/@cellIdentity' file.xml | sed 's/^ //; s/ /\'$'\n''/g'` – mklement0 May 11 '14 at 21:47