0

I have a string like : <a href="2021_03_19/">2021_03_19/</a> 19-Mar-2021 11:55 -

stored in a variable a

I tried to extract from it the sequence: 2021_03_19, the second one after /"> sequence with the following script:

a=${a##'/">'}
a=${a%%'/</a'}

But the final result is the same string as the input.

Alin Tudor
  • 29
  • 6
  • 1
    That looks like html, but the `/"` is invalid for a html. Is this only a part of html file? If so - use `xmllint` or `xmlstarlet` or other xml aware tool. – KamilCuk Mar 23 '21 at 08:30

3 Answers3

1

You have to match the before and after pattern too.

a=${a##*'/">'}
a=${a%%'/</a'*}
KamilCuk
  • 120,984
  • 8
  • 59
  • 111
1

The pattern in the parameter expansion needs to match the entire string you want to remove. You are trying to trim the literal prefix /"> but of course the string does not begin with this string, so the parameter expansion does nothing.

Try

a=${a##*'/">'}
a=${a%%'/</a'*}

The single quotes are kind of unusual; I would perhaps instead backslash-escape each metacharacter which should be matched literally.

a=${a##*/\"\>}
a=${a%%/\</a*}
tripleee
  • 175,061
  • 34
  • 275
  • 318
1

You could use:

a='<a href="2021_03_19/">2021_03_19/</a> 19-Mar-2021 11:55 -'
b=${a#*>}
c=${b%%/<*}

Based on Extract substring in Bash

In your example you want to select based on 3 characters but have ##, not ###. I did try that but doesn't seem to work either. So, therefore an alternative solution.