4

I'm wondering if it's possible (recommended might be the better word) to use sed to convert URLs into HTML hyperlinks in a document. Therefore, it would look for things like:

http://something.com

And replace them with

<a href="http://something.com">http://something.com</a>

Any thoughts? Could the same also be done for email addresses?

Mike Crittenden
  • 5,779
  • 6
  • 47
  • 74

5 Answers5

5

This might work.

sed -i -e "s|http[:]//[^ ]*|<a href=\"\0\">\0</a>|g" yourfile.txt

It depends on the url being followed by a space (which isn't always the case).

You could do similar for e-mails with.

sed -i -e "s|\w+@\w+\.\w+(\.\w+)?|<a href=\"mailto:\0\">\0</a>|g" yourfile.txt

Those might get you started. I suggest leaving off the -i option to test your output before making the changes inline.

Jason R. Coombs
  • 41,115
  • 10
  • 83
  • 93
2

The file contain the following content

http://something.com

The following code will give the correct output

sed -r 's/(.*)/\<a href="\1">\1\<\/a\>/' file
muruga
  • 2,092
  • 2
  • 20
  • 28
  • This answer is trivial, provides no additional information over other answers previously given, and doesn't even output correct HTML for the example supplied (missing quotes). – Jason R. Coombs Mar 22 '10 at 17:07
  • Now it is give the correct answer. It will give the quotes also. – muruga Mar 23 '10 at 03:27
  • not really. rememeber OP has a document that has other text. if you use (.*), you will be substituting the whole line with other text as well. – ghostdog74 Mar 23 '10 at 03:35
1
sed -i.bakup 's|http.[^ \t]*|<a href="&">&</a>|'  htmlfile
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • It's safer to add `-r` (extended regex), otherwise it may fail on _sed: -e expression : unterminated `s' command_ – Noam Manos Sep 13 '20 at 14:19
0

you can use awk

awk '
{
 for(i=1;i<=NF;i++){
   if ($i ~ /http/){
      $i="<a href=\042"$i"\042>"$i"</a>"
   }
 }
} 1 ' file

output

$ cat file
blah http://something.com test http://something.org

$ ./shell.sh
blah <a href="http://something.com">http://something.com</a> test <a href="http://something.org">http://something.org</a>
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
-1

While you could use sed, and I will typically only use sed if I need something that's write-only (that is, it only needs to work and doesn't need to be maintained).

I find the Python regular expression library to be more accessible (and gives the ability to add more powerful constructs).

import re
import sys

def href_repl(matcher):
    "replace the matched URL with a hyperlink"
    # here you could analyze the URL further and make exceptions, etc
    #  to how you did the substitution. For now, do a simple
    #  substitution.
    href = matcher.group(0)
    return '<a href="{href}">{href}</a>'.format(**vars())

text = open(sys.argv[1]).read()
url_pattern = re.compile(re.escape('http://') + '[^ ]*')
sys.stdout.write(url_pattern.sub(href_repl, text))

Personally, I find that much easier to read and maintain.

Jason R. Coombs
  • 41,115
  • 10
  • 83
  • 93