Using grep in linux to pipe all urls contained in an xml file to a seperate file

Question

I have an xml file which looks like the following. How can I use grep to search through this file and pipe all the urls into a file seperated by a new line.

<menus>
    <defaultMenu>
        <group>
            <menuItem name="Example one" url="http://www.google.com">
                <menuItem name="Example Two" url="http://www.yahoo.com" />
                <menuItem name="Example Three" url="http://www.bing.com" />
            </menuItem>
        </group>
    </defaultMenu>
</menus>

For example I want the output file to contain:

http://www.google.com
http://www.yahoo.com
http://www.bing.com

Jotne · Accepted Answer · 2014-06-06T10:46:19.573

1

If you like to try gnu awk (due to RS)

awk -v RS="url" -F\" 'NR>1{print $2}' file >newfile
http://www.google.com
http://www.yahoo.com
http://www.bing.com

A simple awk

awk -F\" '/url/{print $4}' file
http://www.google.com
http://www.yahoo.com
http://www.bing.com

This works only if format is same all the time.

edited Jun 06 '14 at 10:46

answered Jun 06 '14 at 10:09

Jotne

40,548
12
51
55

Avinash Raj · Answer 2 · 2014-06-06T10:52:57.073

0

Through GNU sed,

$ sed -rn 's/^.*url="([^"]*)".*$/\1/p' file
http://www.google.com
http://www.yahoo.com
http://www.bing.com

And the one through GNU grep with -P(perl-regex) option,

$ grep -oP '(?<=url=\")[^"]*' file
http://www.google.com
http://www.yahoo.com
http://www.bing.com

edited Jun 06 '14 at 10:52

answered Jun 06 '14 at 10:17

Avinash Raj

172,303
28
230
274

You should write that you need `gnu grep` with the `-P`. Perl regular expression – Jotne Jun 06 '14 at 10:49

score 0 · Answer 3 · edited Jun 06 '14 at 10:56

0

Suppose your file sample.html run the below command to get the url in sample1.html file

cat sample.html | grep -o url=\".*\" | cut -d "=" -f2 > sample1.html

and if you want to remove quotes also then

cat sample.html | grep -o url=\".*\" | cut -d "=" -f2 | sed "s/\"//g" > sample1.html

edited Jun 06 '14 at 10:56

Jotne

40,548
12
51
55

answered Jun 06 '14 at 10:49

Mahattam

5,405
4
23
33

You should not use `cat` with programs that can read it itself like `grep`. `grep -o url=\".*\" sample.html | cut ...` – Jotne Jun 06 '14 at 10:51
Click the `{}` in the help line to mark it as code. – Jotne Jun 06 '14 at 10:56
Yup agree with you, instead of cat grep only will work here and that too fast thanks :) – Mahattam Jun 06 '14 at 10:59

Using grep in linux to pipe all urls contained in an xml file to a seperate file

3 Answers3