0

I have an xml file which looks like the following. How can I use grep to search through this file and pipe all the urls into a file seperated by a new line.

<menus>
    <defaultMenu>
        <group>
            <menuItem name="Example one" url="http://www.google.com">
                <menuItem name="Example Two" url="http://www.yahoo.com" />
                <menuItem name="Example Three" url="http://www.bing.com" />
            </menuItem>
        </group>
    </defaultMenu>
</menus>

For example I want the output file to contain:

http://www.google.com
http://www.yahoo.com
http://www.bing.com
Ben Paton
  • 1,432
  • 9
  • 35
  • 59

3 Answers3

1

If you like to try gnu awk (due to RS)

awk -v RS="url" -F\" 'NR>1{print $2}' file >newfile
http://www.google.com
http://www.yahoo.com
http://www.bing.com

A simple awk

awk -F\" '/url/{print $4}' file
http://www.google.com
http://www.yahoo.com
http://www.bing.com

This works only if format is same all the time.

Jotne
  • 40,548
  • 12
  • 51
  • 55
0

Through GNU sed,

$ sed -rn 's/^.*url="([^"]*)".*$/\1/p' file
http://www.google.com
http://www.yahoo.com
http://www.bing.com

And the one through GNU grep with -P(perl-regex) option,

$ grep -oP '(?<=url=\")[^"]*' file
http://www.google.com
http://www.yahoo.com
http://www.bing.com
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

Suppose your file sample.html run the below command to get the url in sample1.html file

cat sample.html | grep -o url=\".*\" | cut -d "=" -f2 > sample1.html

and if you want to remove quotes also then

cat sample.html | grep -o url=\".*\" | cut -d "=" -f2 | sed "s/\"//g" > sample1.html
Jotne
  • 40,548
  • 12
  • 51
  • 55
Mahattam
  • 5,405
  • 4
  • 23
  • 33