0

How could I remove links from text? I think that I should use sed command but I don't know exact syntactics.

llokely
  • 93
  • 1
  • 10
  • You should show an example of what you have and what you want. Do you mean HTML links? What do you want to do with the rest of the HTML in the file? You should use a Perl or Python lib or another tool that is specialized for manipulating HTML. Regular expressions are [insufficient](http://stackoverflow.com/q/1732348/26428#1732454). – Dennis Williamson Nov 24 '10 at 17:22
  • possible duplicate of [Find Links and Remove them from HTML](http://stackoverflow.com/questions/1784507/find-links-and-remove-them-from-html) – Dennis Williamson Nov 24 '10 at 17:24
  • My text looks like this: lallalalala http://blabla.com babababab http://hehehe.org. – llokely Nov 25 '10 at 10:58
  • possible duplicate of [sed to remove URLs from a file](http://stackoverflow.com/questions/4283344/sed-to-remove-urls-from-a-file) – johnsyweb Nov 27 '10 at 05:58

1 Answers1

0

This will remove everything ending in .com or .org:

sed 's/\s\?\w\+\.\(com\|org\)//g' foo.txt

input:

lallalalala blabla.com babababab hehehe.org. 

output:

lallalalala babababab.

EDIT: here it is in POSIX standard. I also added some more characters to match cases where there may be sub-domains or protocols (http://)

sed 's/[[:space:]]\?[A-Za-z0-9_\/\:\.-]\+\.\(com\|org\)//g' foo.txt 

Also note that this does not cover all possible URL characters or URLs that reference a resource after the domain suffix (example.com/query?foo=bar).

Brian Clements
  • 3,787
  • 1
  • 25
  • 26