5

I've got a HTML file, and I'd like to grab all the links that are in the file and save it into another file using Vim.

I know that the regex would be something like:

:g/href="\v([a-z_/]+)"/

but I don't know where to go from here.

Brian Carper
  • 71,150
  • 28
  • 166
  • 168
Sasha
  • 2,227
  • 3
  • 23
  • 31

5 Answers5

19

Jeff Meatball Yang was almost there.

As Sasha wrote if you use w it writes the full original file to the outfile

To only write the matched line, you have to add '.' before 'w':

:g/href="\v([a-z_/]+)"/ .w >> outfile

Note that the outfile needs to exists.

Konrad
  • 17,740
  • 16
  • 106
  • 167
  • 5
    If the file doesn't exist, or if you want to create the file, you can do: `.w!` – Tom Apr 29 '15 at 21:05
4

clear reg:x

qxq

search regex(whatever) and append to reg:x

:g/regex/call setreg('X', matchstr(getline('.'), 'regex') . "\n")

open a new tab

:tabnew outfile

put reg:x

"xp

write file

:w
kev
  • 155,172
  • 47
  • 273
  • 272
  • 1
    I learned a lot from this answer. qxq to clear out reg x. Using upper-case X will append to the register x and how to extract only the matching string from a matching line. Thank you! Thank you! – Amjith Sep 09 '11 at 16:21
  • I tried this but if there are multiple matches on one line it only gets the first one. How can you get it to do each match? – Kyle A. Nov 01 '21 at 15:11
3

The challenge here lies with extracting all of the links where there may be multiple on line, otherwise you'd be able to simply do:

" Extract all lines with href=
:g/href="[^"]\+"/w >> list_of_links.txt
" Open the new file
:e list_of_links.txt
" Extract the bit inside the quotation marks
:%s/.*href="\([^"]\+\)".*/\1/

The simplest approach would probably be to do this:

" Save as a new file name
:saveas list_of_links.txt
" Get rid of any lines without href=
:g!/href="\([^"]\+\)"/d
" Break up the lines wherever there is a 'href='
:%s/href=/\rhref=/g
" Tidy up by removing everything but the bit we want
:%s/^.*href="\([^"]\+\)".*$/\1/

Alternatively (following a similar theme),

:g/href="[^"]\+"/w >> list_of_links.txt
:e list_of_links.txt
:%s/href=/\rhref=/g
:%s/^.*href="\([^"]\+\)".&$/\1/

(see :help saveas, :help :vglobal, :help :s)

However, if you really wanted to do it in a more direct way, you could do something like this:

" Initialise register 'h'
:let @h = ""
" For each line containing href=..., get the line, and carry out a global search
" and replace that extracts just the URLs and a double quote (as a delimiter)
:g/href="[^"]\+"/let @h .= substitute(getline('.'), '.\{-}href="\([^"]\+\)".\{-}\ze\(href=\|$\)', '\1"', 'g')
" Create a new file
:new
" Paste the contents of register h (entered in normal mode)
"hp
" Replace all double quotes with new-lines
:s/"/\r/g
" Save
:w

Finally, you could do it in a function with a for loop, but I'll leave that for someone else to write!

DrAl
  • 70,428
  • 10
  • 106
  • 108
2

Put your cursor in the first row/column and try this:

:redir > output.txt|while search('href="', "We")|exe 'normal yi"'|echo @"|endwhile|redir END
Brian Carper
  • 71,150
  • 28
  • 166
  • 168
1

Have you tried this?

:g/href="\v([a-z_/]+)"/w >> outfile

Jeff Meatball Yang
  • 37,839
  • 27
  • 91
  • 125
  • 3
    This doesn't work. It results in the search term being correctly found, but it then simply outputs the entire contents of the file into the new outfile. – Sasha Jun 23 '09 at 05:45