Bash - sed command is freezing when nothing is matched

Question

I'm using sed for parse some HTML page, here is the code:

name=`echo $p | sed -n 's/.*href=\"\([^"]*\)" class=\"alleLink iTitle\"><span>\([^<]*\)<\/span>.*/\1/p'`;

When there is a match it works good - returns required substring. But when there is no match, sed just freeze and the script is doing nothing. I just wanna receive empty string or something like that.

Do you know what to do?

Thanks Roman Zkamene

Could you give links to a page that matches and one that fails? — potong, Nov 17 '11 at 20:23
please edit your question above to show us a working and a non-working value for `$p`. My quick test did not have a problem exiting when it didn't match. Good luck. — shellter, Nov 17 '11 at 20:31
I also have a couple of freezing `sed` processes. What's interesting is that `sed` is being executed from a Java process as a system call. If I execute the sed all by itself from the command line, it works without a hitch — hanzo2001, Sep 10 '19 at 11:04

score 1 · Answer 1 · answered Nov 17 '11 at 19:08

1

I recommend you to install perl module WWW::Mechanize with the command

cpan -i WWW::Mechanize

or search in your package manager for perl.*mechanize

then, you will be able to run this command in the shell (interactive or not) to see all the links on a page :

mech-dump --links http://foobar.tld

Moreover, sed is not the right tool to parse HTML. python ruby or perl will be your best bet.

I think by example of

python + lxml or python + beautifoul soup
perl + WWW::Mechanize

One more thing :

you can use any character you want as sed delimiter, so escaping / is not necessary and will be more readable for everyone

answered Nov 17 '11 at 19:08

Gilles Quénot

173,512
41
224
223

Thanks for replies. Unfortunately I don't have rights on server, so I cannot install anything. I tried perl for regular expressions but it was veeery slow. A really need to have the script fast. And when i tried perl it was crunching one page for maybe 5 seconds and that is too much. So for that reason i don't use any complexe parser for html. I admit that sed isn't ideal, but is faster then pearl. – Roman Zkamene Nov 17 '11 at 19:14
You can install third generation language modules in your home directory as well ;) – Gilles Quénot Nov 17 '11 at 19:16
Ok, thanks ... it's a possibility. But it will takes time. Better would be just solve this little problem with sed because it's nearly working. I just need to do some try - catch like thing and ignore the freezing. Do you have some idea? – Roman Zkamene Nov 17 '11 at 19:38
1

For this, provides a full example with a real world html line – Gilles Quénot Nov 17 '11 at 19:40

score 1 · Answer 2 · edited May 23 '17 at 11:57

1

A couple of points:

This one has to be inevitably the first
You can simplify the expression using the -r switch for sed

edited May 23 '17 at 11:57

Community

1
1

answered Nov 17 '11 at 19:09

ata

2,045
1
14
19

1. I don't have experiences with any parser, so it's easier for me to use regular expressions. 2. I'm sorry maybe I have some older version of sed but mine doesn't know -r switch. Thanks – Roman Zkamene Nov 17 '11 at 19:28
1

@RomanZkamene: the `-r` switch is a GNU sed switch to turn on extended expressions. If you're using non-Linux try invoking `gsed` or, if that doesn't exist, try `sed -E` which will turn on extended expressions for some versions of `sed`. If that still fails, `man sed` and see if extended expressions are supported in any way. – sorpigal Nov 18 '11 at 12:35

Bash - sed command is freezing when nothing is matched

2 Answers2