0

I have the RSS page with the html tag like this:

<description>
<![CDATA[
 <a href='https://www.24h.com.vn/bong-da/psg-trao-than-dong-mbappe-sieu-luong-bong-chi-kem-messi-real-vo-mong-c48a1112120.html' title='PSG trao thần đồng Mbappe siĂªu lương bổng: Chỉ kĂ©m Messi, Real vỡ má»™ng'><img width='130' height='100' src='https://image.24h.com.vn/upload/4-2019/images/2019-12-27/1577463916-359-thumbnail.jpg' alt='PSG trao thần đồng Mbappe siĂªu lương bổng: Chỉ kĂ©m Messi, Real vỡ má»™ng' title='PSG trao thần đồng Mbappe siĂªu lương bổng: Chỉ kĂ©m Messi, Real vỡ má»™ng' /></a><br />PSG trong ná»— lá»±c giữ chĂ¢n “sĂ¡t thủ†Kylian Mbappe, sẵn sĂ ng tăng lương khổng lồ - má»™t động thĂ¡i nhằm xua Ä‘uổi Real Madrid.
]]>
</description>

Please help me how can i get the value of src to show the image. I also try Getting img url from RSS feed swift but it doesn't work. Here is my code to get src (the code always run to image = "nil"):

let regex: NSRegularExpression = try! NSRegularExpression(pattern: "<img.*?src=\"([^\"]*)\"", options: .caseInsensitive)
let range = NSMakeRange(0, description.count)
if let textCheck = regex.firstMatch(in: description, options: .withoutAnchoringBounds, range: range) {
    let text = (description as NSString).substring(with: textCheck.range(at: 1))
    image = text
} else {
    image = "nil"
}

Thank for your helping !

Dávid Pásztor
  • 51,403
  • 9
  • 85
  • 116
Trung Nguyen
  • 138
  • 3
  • 14
  • 2
    Obligatory [you can't parse HTML with Regex](https://stackoverflow.com/a/1732454/3141234). Well actually, you can, in limited cases, but not generally. You always run into edge cases and bugs, and it's an all-around frustrating time. I would suggest you just use an HTML parser (like SwiftSoup), use it to parse the document, and just extract your `src` attribute's value from there. – Alexander Jan 02 '20 at 15:34
  • @Alexander-ReinstateMonica, *`XML`, not `HTML`. Both `RSS` and `HTML` are `XML`-based formats, but `RSS` **is not** `HTML`. – user28434'mstep Jan 02 '20 at 15:36
  • @Alexander-ReinstateMonica: i already parse the data successfully. I use FeedKit library. My issue is that i can not get the value of src – Trung Nguyen Jan 02 '20 at 15:41
  • @user28434 if you want to get real technical about it, it's applicable to [SGML](https://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language), of which HTML was a variant (until recently). But most people don't know that (and shouldn't have to), and OP was talking about HTML, so I focused on HTML. – Alexander Jan 02 '20 at 15:44
  • @TrungNguyen Parsing out attributes should be a feature of the parsing library. I'm not familiar with FeedKit or its API, but you should look into it, it probably already has something to do this for you. – Alexander Jan 02 '20 at 18:39

1 Answers1

2

You need to change your regex to be able to match single-quotes as well, not just double quotes, since the html string you're trying to parse contains single quotes, not double quotes like the one in the linked Q&A.

let regex: NSRegularExpression = try! NSRegularExpression(pattern: "<img.*?src=[\"\']([^\"\']*)[\"\']", options: .caseInsensitive)

If you are sure you only need to match single quotes, you can simplify the pattern by replacing [\"\'] with \'. Currently, the regex pattern will match both single and double quotes.

Dávid Pásztor
  • 51,403
  • 9
  • 85
  • 116