87

I have a file containing the following lines:

  <parameter name="PortMappingEnabled" access="readWrite" type="xsd:boolean"></parameter>
  <parameter name="PortMappingLeaseDuration" access="readWrite" activeNotify="canDeny" type="xsd:unsignedInt"></parameter>
  <parameter name="RemoteHost" access="readWrite"></parameter>
  <parameter name="ExternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="ExternalPortEndRange" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="InternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="PortMappingProtocol" access="readWrite"></parameter>
  <parameter name="InternalClient" access="readWrite"></parameter>
  <parameter name="PortMappingDescription" access="readWrite"></parameter>

I want to execute command on this file to extract only the parameter names as displayed in the following output:

$sedcommand file.txt
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription

What could be this command?

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
MOHAMED
  • 41,599
  • 58
  • 163
  • 268
  • 1
    Note that you're going to be sad when that XML comes to you on multiple lines, or if the order of the arguments changes. If that's at all a possibility, you'll want to look into using a proper XML parser. – Andy Lester May 21 '13 at 16:54
  • Hm, double standard with questions that can be answered in 10 seconds vs. ones that require more time? Where is the post asking what you've tried? Oh wait... – rliu May 21 '13 at 16:59

5 Answers5

137

grep was born to extract things:

grep -Po 'name="\K[^"]*'

test with your data:

kent$  echo '<parameter name="PortMappingEnabled" access="readWrite" type="xsd:boolean"></parameter>
  <parameter name="PortMappingLeaseDuration" access="readWrite" activeNotify="canDeny" type="xsd:unsignedInt"></parameter>
  <parameter name="RemoteHost" access="readWrite"></parameter>
  <parameter name="ExternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="ExternalPortEndRange" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="InternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="PortMappingProtocol" access="readWrite"></parameter>
  <parameter name="InternalClient" access="readWrite"></parameter>
  <parameter name="PortMappingDescription" access="readWrite"></parameter>
'|grep -Po 'name="\K[^"]*'
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription
Kent
  • 189,393
  • 32
  • 233
  • 301
  • 9
    Just FYI, from the grep manpage regarding `-P`: "This is highly experimental and **grep -P** may warn of unimplemented features." – Trevor Robinson Dec 03 '14 at 23:20
  • Not all *nix distros support 'grep -o'. One instance I know of is AIX – Max Mar 31 '15 at 19:36
  • 1
    @FukuzawaYukio I think the grep shipped by ubuntu linux should support it right? even though I am not ubuntu user. The question was tagged with Linux & ubuntu, not Unix or Aix. But you comment is correct. – Kent Apr 01 '15 at 07:43
  • @Kent Right, I forgot to check what the question was targeting. Ubuntu's grep does indeed support `-o`. – Max Apr 01 '15 at 22:54
  • 11
    I had to look up `\K`: It keeps what's left of it outside of the match (so you don't get `name="PortMappingLeaseDuration"`. [Further reading](http://www.regular-expressions.info/refadv.html) – nachocab May 19 '16 at 19:53
  • 6
    For those not wanting to use the `-P` flag; no other extended regex that is supported by the default grep will do what the `\K` does, but you could simply pipe it through sed: `grep -o 'name="[^"]* | sed 's/name="//g'` – Leon S. Dec 27 '17 at 15:19
  • 4
    Alternatively you can also use grep twice: `grep -o 'name="[^"]*' | grep -o '[^"]*$'`. It produces the same result. – Crisu83 May 27 '20 at 12:58
  • Yea, no `-P` on MacOS: `grep: invalid option -- P` – sh37211 Jan 03 '23 at 21:24
  • @sh37211 most linux distros preinstall the gnu impl. The question has tagged "Linux" Please check Leon's comment above if -P is not available in your grep impl. – Kent Jan 11 '23 at 22:14
  • In my case on mac `grep` regex didn't work without `-E` option, so the final command is `grep -Eo 'name="[^"]* | sed 's/name="//g'` – durex Aug 24 '23 at 14:40
  • @durex the question is tagged with `linux`. On MacOS, if you didn't install related GNU utilities, such as gnu grep, gnu sed, gnu awk. The options could be different from them in Linux. – Kent Aug 25 '23 at 09:39
114

sed 's/[^"]*"\([^"]*\).*/\1/'

does the job.

explanation of the part inside ' '

  • s - tells sed to substitute
  • / - start of regex string to search for
  • [^"]* - any character that is not ", any number of times. (matching parameter name=)
  • " - just a ".
  • ([^"]*) - anything inside () will be saved for reference to use later. The \ are there so the brackets are not considered as characters to search for. [^"]* means the same as above. (matching RemoteHost for example)
  • .* - any character, any number of times. (matching " access="readWrite"> /parameter)
  • / - end of the search regex, and start of the substitute string.
  • \1 - reference to that string we found in the brackets above.
  • / end of the substitute string.

basically s/search for this/replace with this/ but we're telling him to replace the whole line with just a piece of it we found earlier.

Empi3
  • 3
  • 3
unxnut
  • 8,509
  • 3
  • 27
  • 41
45

You want awk.

This would be a quick and dirty hack:

awk -F "\"" '{print $2}' /tmp/file.txt

PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription
Chris
  • 1,410
  • 12
  • 21
19

You should not parse XML using tools like sed, or awk. It's error-prone.

If input changes, and before name parameter you will get new-line character instead of space it will fail some day producing unexpected results.

If you are really sure, that your input will be always formated this way, you can use cut. It's faster than sed and awk:

cut -d'"' -f2 < input.txt

It will be better to first parse it, and extract only parameter name attribute:

xpath -q -e //@name input.txt | cut -d'"' -f2

To learn more about xpath, see this tutorial: http://www.w3schools.com/xpath/

Michał Šrajer
  • 30,364
  • 7
  • 62
  • 85
13

Explaining how you can use cut:

cat yourxmlfile | cut -d'"' -f2

It will 'cut' all the lines in the file based on " delimiter, and will take the 2nd field , which is what you wanted.

Rushi Agrawal
  • 3,208
  • 2
  • 21
  • 26
  • 1
    You want to avoid the [useless `cat`](https://stackoverflow.com/questions/11710552/useless-use-of-cat) though. – tripleee Mar 04 '21 at 11:52