0

I have the below string and I want to extract everything from <img... to the closing " after .jpg.

I tried the below, but it doesn't find just the first " but rather the very end.

Can anyone help?

In [14]: start = 'img src="'
In [15]: end = '"'
print string[string.find(start)+len(start):string.rfind(end)]

STRING:

 <p><a href="https://news.yahoo.com/us-ambassador-takes-post-united-nations-141833297.html"><img src="http://l1.yimg.com/uu/api/res/1.2/1f8jyGM.NfkxLb_.OgMaIQ--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/http://media.zenfs.com/en_us/News/afp.com/f5bbc19135065fcfff40e6ece9650f4ab225fa97.jpg" width="130" height="86" alt="New US ambassador takes up post at United Nations" align="left" title="New US ambassador takes up post at United Nations" border="0" ></a>US Ambassador Kelly Craft took up her post at the United Nations on Thursday, vowing to defend America's values and interests nine months after the departure of her high-profile predecessor Nikki Haley. Craft, 57, served previously as US ambassador to Canada where she was involved in negotiations on a new US Mexico Canada free trade agreement.<p><br clear="all">
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
kikee1222
  • 1,866
  • 2
  • 23
  • 46

2 Answers2

0

You can use Regex like this, if you are sure it would be always same.

<img.*?jpg\"

Here is the link for it, Regex101 You can tweak as you want though depending upon your requirements. Regex is the right tool for it instead of sting find and len and all that.

Shivaraj
  • 400
  • 5
  • 16
0

You could just use the .split() function, if you don't want to use a reg ex.

str = """<p><a href="https://news.yahoo.com/us-ambassador-takes-post-united-nations-141833297.html"><img src="http://l1.yimg.com/uu/api/res/1.2/1f8jyGM.NfkxLb_.OgMaIQ--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/http://media.zenfs.com/en_us/News/afp.com/f5bbc19135065fcfff40e6ece9650f4ab225fa97.jpg" width="130" height="86" alt="New US ambassador takes up post at United Nations" align="left" title="New US ambassador takes up post at United Nations" border="0" ></a>US Ambassador Kelly Craft took up her post at the United Nations on Thursday, vowing to defend America's values and interests nine months after the departure of her high-profile predecessor Nikki Haley. Craft, 57, served previously as US ambassador to Canada where she was involved in negotiations on a new US Mexico Canada free trade agreement.<p><br clear="all">"""


#final should just be the url
final = str.split("img src=\"")[1].split("\" width=")[0]

print(final)

Output:

http://l1.yimg.com/uu/api/res/1.2/1f8jyGM.NfkxLb_.OgMaIQ--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/http://media.zenfs.com/en_us/News/afp.com/f5bbc19135065fcfff40e6ece9650f4ab225fa97.jpg
wpercy
  • 9,636
  • 4
  • 33
  • 45
Parcevel
  • 193
  • 10
  • this outputs _all_ links in a single string, probably not ideal – wpercy Sep 12 '19 at 21:39
  • True but split("http://") and then just adding it back to each string gives you an array of the urls. Also the question was to get the string between the two characters that this code does. – Parcevel Sep 12 '19 at 21:42