Remove string before and after characters in bash

Question

I have a string like : <a href="2021_03_19/">2021_03_19/</a> 19-Mar-2021 11:55 -

stored in a variable a

I tried to extract from it the sequence: 2021_03_19, the second one after /"> sequence with the following script:

a=${a##'/">'}
a=${a%%'/</a'}

But the final result is the same string as the input.

That looks like html, but the `/"` is invalid for a html. Is this only a part of html file? If so - use `xmllint` or `xmlstarlet` or other xml aware tool. — KamilCuk, Mar 23 '21 at 08:30

score 1 · Answer 1 · answered Mar 23 '21 at 08:31

1

You have to match the before and after pattern too.

a=${a##*'/">'}
a=${a%%'/</a'*}

answered Mar 23 '21 at 08:31

KamilCuk

120,984
8
59
111

score 1 · Accepted Answer · answered Mar 23 '21 at 08:32

The pattern in the parameter expansion needs to match the entire string you want to remove. You are trying to trim the literal prefix /"> but of course the string does not begin with this string, so the parameter expansion does nothing.

Try

a=${a##*'/">'}
a=${a%%'/</a'*}

The single quotes are kind of unusual; I would perhaps instead backslash-escape each metacharacter which should be matched literally.

a=${a##*/\"\>}
a=${a%%/\</a*}

score 1 · Answer 3 · answered Mar 23 '21 at 08:44

You could use:

a='<a href="2021_03_19/">2021_03_19/</a> 19-Mar-2021 11:55 -'
b=${a#*>}
c=${b%%/<*}

Based on Extract substring in Bash

In your example you want to select based on 3 characters but have ##, not ###. I did try that but doesn't seem to work either. So, therefore an alternative solution.

Remove string before and after characters in bash

3 Answers3