1

I'm trying to parse some HTML to pull all links that come after any occurrences of the string:

market_listing_row_link" href="

to gather a list of item URL's using only the Swift 4 Standard Library.

What I think I need is a for loop that keeps on checking characters with a condition that once the full string is found, it then starts reading the following item URL into an array until a double quote is reached, stopping and then repeating this process until the end of file. Slightly familiar in C we had access to a function (I think it was fgetc) that did this while advancing a position indicator for the file. Is there any similar way to do this in Swift?

My code so far can only find the first occurrence of the string I'm looking for when there are 10 I need to find.

import Foundation

extension String {
    func slice(from: String, to: String) -> String? {
        return (range(of: from)?.upperBound).flatMap { substringFrom in
            (range(of: to, range: substringFrom..<endIndex)?.lowerBound).map { substringTo in
                String(self[substringFrom..<substringTo])
            }
        }
    }
}

let itemListURL = URL(string: "http://steamcommunity.com/market/search?appid=252490")!
let itemListHTML = try String(contentsOf: itemListURL, encoding: .utf8)
let itemURL = URL(string: itemListHTML.slice(from: "market_listing_row_link\" href=\"", to: "\"")!)!

print(itemURL)

// Prints the current first URL found matching: http://steamcommunity.com/market/listings/252490/Wyrm%20Chest
ANoobSwiftly
  • 311
  • 4
  • 13
  • 2
    I’m posting this as a comment instead of an answer because it doesn’t directly answer your question. Have you considered using [XMLParser](https://developer.apple.com/documentation/foundation/xmlparser) instead? True XML parsing is generally preferred over regex shenanigans when it comes to HTML—see, for example, [this famous Stack Overflow answer.](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Alan Kantz Oct 20 '17 at 13:55
  • 1
    @AlanKantz HTML is not XML unless it happens to actually be xHTML. – rmaddy Oct 20 '17 at 14:03
  • @AlanKantz Forget that it's HTML, I want to search a string of nonsense for a sequence of characters, read the characters that follow that sequence into a string variable up until a certain character and then continue searching for another occurrence of that sequence to repeat the process. – ANoobSwiftly Oct 20 '17 at 14:13

1 Answers1

2

You can use regex to find all string occurrences between two specific strings (check this SO answer) and use the extension method ranges(of:) from this answer to get all ranges of that regex pattern. You just need to pass options .regularExpression to that method.


extension String {
    func ranges(of string: String, options: CompareOptions = .literal) -> [Range<Index>] {
        var result: [Range<Index>] = []
        var start = startIndex
        while let range = range(of: string, options: options, range: start..<endIndex) {
            result.append(range)
            start = range.lowerBound < range.upperBound ? range.upperBound : index(range.lowerBound, offsetBy: 1, limitedBy: endIndex) ?? endIndex
        }
        return result
    }
    func slices(from: String, to: String) -> [Substring] {
        let pattern = "(?<=" + from + ").*?(?=" + to + ")"
        return ranges(of: pattern, options: .regularExpression)
            .map{ self[$0] }
    }
}

Testing playground

let itemListURL = URL(string: "http://steamcommunity.com/market/search?appid=252490")!
let itemListHTML = try! String(contentsOf: itemListURL, encoding: .utf8)
let result = itemListHTML.slices(from: "market_listing_row_link\" href=\"", to: "\"")
result.forEach({print($0)})

Result

http://steamcommunity.com/market/listings/252490/Night%20Howler%20AK47 http://steamcommunity.com/market/listings/252490/Hellcat%20SAR http://steamcommunity.com/market/listings/252490/Metal http://steamcommunity.com/market/listings/252490/Volcanic%20Stone%20Hatchet http://steamcommunity.com/market/listings/252490/Box http://steamcommunity.com/market/listings/252490/High%20Quality%20Bag http://steamcommunity.com/market/listings/252490/Utilizer%20Pants http://steamcommunity.com/market/listings/252490/Lizard%20Skull http://steamcommunity.com/market/listings/252490/Frost%20Wolf http://steamcommunity.com/market/listings/252490/Cloth

Leo Dabus
  • 229,809
  • 59
  • 489
  • 571