0

I'm writing a command line tool that uses WKWebView to capture screenshots of webpages. To do this, I have to ensure that the page is fully loaded, including all client-side redirects, before capturing the screenshot.

In general, this happens automatically, and I just have to wait for webView(_:didFinish:) to be called. However sometimes redirection happens after webView(_:didFinish:) is called for the original URL (e.g. all Google search result links).

To handle this, I check for new requests after loading is complete via webView(_:decidePolicyFor navigationAction:) and repeatedly call webView.load on the new requests until no more are generated.

The problem is that of course websites make lots of requests after a page is fully loaded that aren't redirections, e.g. sending tracking data. So I end up calling webView.load on those requests instead, and instead of screenshotting e.g. a blogpost, I end up screenshotting the URL that an embedded tracking script sent data to, which visually is of course just a blank page.

Is there any way I can distinguish client-side redirect requests from these background ajax requests? Or failing that maybe some other way to follow redirects without calling webView.load on each new request?

Here is a functioning simplified version of my code:

import WebKit

@MainActor
class WebContainer: NSObject {
    
    private lazy var webView: WKWebView = {
        let webView: WKWebView = WKWebView()
        webView.navigationDelegate = self
        return webView
    }()
    private var redirectURL: URL? // set every time decidePolicyFor navigationAction is called
    private var loadedURL: URL? // set when didFinish is called
    
    private var continuation: UnsafeContinuation<Void, Error>?
    private func load(request: URLRequest) async throws -> Int {
        try await withUnsafeThrowingContinuation { continuation in // required in the absence of an event loop, as this is a command line tool
            self.continuation = continuation
            webView.load(request)
        }
    }
    
    private func takeScreenshot() async throws -> Data? {
        // get data from webView and do stuff
    }
    
    private func checkRedirect() async -> Bool {
        try! await Task.sleep(for: .seconds(0.1)) // small delay to wait for new requests
        if redirectURL != loadedURL { // checks if redirectURL has been set to something new
            return true
        }
        return false
    }
    
    func generateData(type: DataType, request: URLRequest) async throws -> Data? {
        
        // Load website
        try await load(request: request)
        
        // Check for redirects after loading completed
        while await checkRedirect() == true {
            let redirectRequest: URLRequest = URLRequest(url: redirectURL!)
            try await load(request: redirectRequest)
        }
        
        // Process & return data
        return try await takeScreenshot()
    }
}

extension WebContainer: WKNavigationDelegate {
    
    func webView(_ webView: WKWebView, decidePolicyFor navigationAction: WKNavigationAction) async -> WKNavigationActionPolicy {
        redirectURL = navigationAction.request.url // if a new request is initiated after webView(_:didFinish:) is called, this will set redirectURL to the new URL
        return WKNavigationActionPolicy.allow
    }
    
    func webView(_ webView: WKWebView, didFinish navigation: WKNavigation!) {
        loadedURL = webView.url
        continuation?.resume(returning: ())
    }

    func webView(_ webView: WKWebView, didFail navigation: WKNavigation!, withError error: Error) {
        navigationFailed = true
        continuation?.resume(throwing: error)
    }

    func webView(_ webView: WKWebView, didFailProvisionalNavigation navigation: WKNavigation!, withError error: Error) {
        navigationFailed = true
        continuation?.resume(throwing: error)
    }
}
mingwei
  • 93
  • 6

0 Answers0