2

I'm using sed for parse some HTML page, here is the code:

name=`echo $p | sed -n 's/.*href=\"\([^"]*\)" class=\"alleLink iTitle\"><span>\([^<]*\)<\/span>.*/\1/p'`;

When there is a match it works good - returns required substring. But when there is no match, sed just freeze and the script is doing nothing. I just wanna receive empty string or something like that.

Do you know what to do?

Thanks Roman Zkamene

  • Could you give links to a page that matches and one that fails? – potong Nov 17 '11 at 20:23
  • please edit your question above to show us a working and a non-working value for `$p`. My quick test did not have a problem exiting when it didn't match. Good luck. – shellter Nov 17 '11 at 20:31
  • I also have a couple of freezing `sed` processes. What's interesting is that `sed` is being executed from a Java process as a system call. If I execute the sed all by itself from the command line, it works without a hitch – hanzo2001 Sep 10 '19 at 11:04

2 Answers2

1

I recommend you to install perl module WWW::Mechanize with the command

cpan -i WWW::Mechanize

or search in your package manager for perl.*mechanize

then, you will be able to run this command in the shell (interactive or not) to see all the links on a page :

mech-dump --links http://foobar.tld

Moreover, sed is not the right tool to parse HTML. python ruby or perl will be your best bet.

I think by example of

  • python + lxml or python + beautifoul soup
  • perl + WWW::Mechanize

One more thing :

you can use any character you want as sed delimiter, so escaping / is not necessary and will be more readable for everyone

Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
  • Thanks for replies. Unfortunately I don't have rights on server, so I cannot install anything. I tried perl for regular expressions but it was veeery slow. A really need to have the script fast. And when i tried perl it was crunching one page for maybe 5 seconds and that is too much. So for that reason i don't use any complexe parser for html. I admit that sed isn't ideal, but is faster then pearl. – Roman Zkamene Nov 17 '11 at 19:14
  • You can install third generation language modules in your home directory as well ;) – Gilles Quénot Nov 17 '11 at 19:16
  • Ok, thanks ... it's a possibility. But it will takes time. Better would be just solve this little problem with sed because it's nearly working. I just need to do some try - catch like thing and ignore the freezing. Do you have some idea? – Roman Zkamene Nov 17 '11 at 19:38
  • 1
    For this, provides a full example with a real world html line – Gilles Quénot Nov 17 '11 at 19:40
1

A couple of points:

  1. This one has to be inevitably the first

  2. You can simplify the expression using the -r switch for sed

Community
  • 1
  • 1
ata
  • 2,045
  • 1
  • 14
  • 19
  • 1. I don't have experiences with any parser, so it's easier for me to use regular expressions. 2. I'm sorry maybe I have some older version of sed but mine doesn't know -r switch. Thanks – Roman Zkamene Nov 17 '11 at 19:28
  • 1
    @RomanZkamene: the `-r` switch is a GNU sed switch to turn on extended expressions. If you're using non-Linux try invoking `gsed` or, if that doesn't exist, try `sed -E` which will turn on extended expressions for some versions of `sed`. If that still fails, `man sed` and see if extended expressions are supported in any way. – sorpigal Nov 18 '11 at 12:35