0

i have this tag:

<img data-original="http://www.video.mediaset.it/bin/515.$plit/640x360_C_2_video_773293_videoThumbnail.jpg" alt="I vincitori di Maria Express" class="img-responsive lazy" width="323" height="186">

but i need to get this: http://www.video.mediaset.it/bin/515.$plit/640x360_C_2_video_773293_videoThumbnail.jpg

but with my regex expression /img data-original="[s]?:\/\/)?([^\/\s]+\/)(.*) i am not able to get the URL...

Who can help me?

Sisso
  • 229
  • 1
  • 5
  • 12
  • Will the text in the quotes ever _not_ be a URL? If it's always a URL, why not just match `img data-original="(.+?)"` – Sweeper Nov 06 '17 at 17:22
  • @Sweeper i get all tag...all attributes – Sisso Nov 06 '17 at 17:25
  • @ctwheels how i extract data-original? – Sisso Nov 06 '17 at 17:32
  • @Sisso See [this](https://stackoverflow.com/questions/2168610/which-html-parser-is-the-best) question. But basically something like [this](https://jsoup.org/cookbook/extracting-data/attributes-text-html) article mentions (assuming jsoup is what's being used) – ctwheels Nov 06 '17 at 17:34

3 Answers3

0

Can you try this ? This pattern allows you to get links end with .jpg

http(.+?)jpg

Or

http([^"]*)

Starts with http and goes until "

tested at this link

Cagri Yalcin
  • 402
  • 4
  • 13
0

Try using this helpful website to make sure that your regex's are correct and then you should be able to copy and paste the code into Java. The website also has a nifty tool where you hightlight your regex's and it will indicate their functions. I've done this many of times to perform the regular expressions in my own code. As a final test you can copy the regular expressions into a text editor such as Notepad++ and manipulate your strings exactly as desired to guarantee they will work in Java code.

cincy_anddeveloper
  • 1,140
  • 1
  • 9
  • 19
0

Add some contrast. This is a " delimited string.

Try then with:

img data-original="([^"]*)"

That way you will gather all non-" characters.

Demo.

PJProudhon
  • 835
  • 15
  • 17