Issue with regex expression

Question

i have this tag:

<img data-original="http://www.video.mediaset.it/bin/515.$plit/640x360_C_2_video_773293_videoThumbnail.jpg" alt="I vincitori di Maria Express" class="img-responsive lazy" width="323" height="186">

but i need to get this: http://www.video.mediaset.it/bin/515.$plit/640x360_C_2_video_773293_videoThumbnail.jpg

but with my regex expression /img data-original="[s]?:\/\/)?([^\/\s]+\/)(.*) i am not able to get the URL...

Who can help me?

Will the text in the quotes ever _not_ be a URL? If it's always a URL, why not just match `img data-original="(.+?)"` — Sweeper, Nov 06 '17 at 17:22
@Sisso See [this](https://stackoverflow.com/questions/2168610/which-html-parser-is-the-best) question. But basically something like [this](https://jsoup.org/cookbook/extracting-data/attributes-text-html) article mentions (assuming jsoup is what's being used) — ctwheels, Nov 06 '17 at 17:34

Cagri Yalcin · Answer 1 · 2017-11-06T17:43:27.320

0

Can you try this ? This pattern allows you to get links end with .jpg

http(.+?)jpg

Or

http([^"]*)

Starts with http and goes until "

tested at this link

edited Nov 06 '17 at 17:43

answered Nov 06 '17 at 17:23

Cagri Yalcin

402
4
13

You need only url ? – Cagri Yalcin Nov 06 '17 at 17:24
Your all links end with jpg ? – Cagri Yalcin Nov 06 '17 at 17:35

score 0 · Answer 2 · answered Nov 06 '17 at 17:24

Try using this helpful website to make sure that your regex's are correct and then you should be able to copy and paste the code into Java. The website also has a nifty tool where you hightlight your regex's and it will indicate their functions. I've done this many of times to perform the regular expressions in my own code. As a final test you can copy the regular expressions into a text editor such as Notepad++ and manipulate your strings exactly as desired to guarantee they will work in Java code.

score 0 · Accepted Answer · answered Nov 06 '17 at 17:36

0

Add some contrast. This is a " delimited string.

Try then with:

img data-original="([^"]*)"

That way you will gather all non-" characters.

Demo.

answered Nov 06 '17 at 17:36

PJProudhon

835
15
17

Issue with regex expression

3 Answers3