-2

I want to pull out the string after src like for the following two samples as:

  1. Images/17/0000894189/0000894189-17-005831/image00003.jpg
  2. Images/17/0000894189/0000894189-17-005831/image0.jpg
<div style="TEXT-ALIGN: center"><img src="Images/17/0000894189/0000894189-17-005831/image00003.jpg"></div>

<div style="TEXT-ALIGN: justify"><iMg style="HEIGHT: 63px; WIDTH: 289px" src="Images/17/0000894189/0000894189-17-005831/image0.jpg"></div>

Could you please suggest regular expression which can give me this value? The position of src attribute in the img tag can vary.

YakovL
  • 7,557
  • 12
  • 62
  • 102
user423574
  • 222
  • 3
  • 5
  • 11
  • I would suggest you use proper library to get the attribute. Unsure what's the programming language you are using though. – SMA Jan 22 '18 at 09:23
  • Possible duplicate of [Regex Tag parsing with src, width, height](https://stackoverflow.com/questions/36978966/regex-img-tag-parsing-with-src-width-height) – Roger Lipscombe Jan 22 '18 at 09:23
  • 1
    Don't use a regex to parse html. What if you have a newline in your tag? – JXG Jan 22 '18 at 09:39
  • Using C# programming language. You're right there could be newline in the tag, what is other way you suggest? – user423574 Jan 22 '18 at 13:31

1 Answers1

0

It depends a bit on where you use the regex, but something like

.*src="(\([^"]*\)".*

should give you the path you're looking for in sed, eg

sed -n '/img src/s#src="(\([^"]*\)"#\1#gp' inputfile
daniu
  • 14,137
  • 4
  • 32
  • 53