Can anyone tell me what's going on here?
#lang racket
(require (planet neil/html-parsing:2:0)
net/url xml/path xml)
(define page "http://stackoverflow.com/questions/18902934/compile-and-run-eclipse-project-from-command-prompt")
(define port (get-pure-port (string->url page) ))
(define xexp (html->xexp port))
(displayln xexp)
(xexpr? xexp)
(define title (se-path* '(title) xexp))
(displayln title)
It pulls an html page down and seems to convert it into an xexpr. In the sense that html->xexp doesn't fail. And the result looks like I'd expect.
But (xexpr? xexp)
returns #f
and trying to call se-path* on it fails, with
se-path*: contract violation
expected: xexpr?
given: '(*TOP* (*DECL* DOCTYPE html) "\n" (html (@ (itemscope) (itemtype "http://schema.org/QAPage")) "\n" (head "\n" "\n" (title "java - Compile and run Eclipse Project from command prompt - Stack Overflow") "\n" " " (link (@ (rel "shortcut icon") (h...
in: the 2nd argument of
(-> se-path? xexpr? any/c)
contract from: <collects>/xml/path.rkt
blaming: anonymous-module
at: <collects>/xml/path.rkt:74.2
So presumably (html->xexp port) is producing invalid xexpr.
How might I go about debugging this? Like I say, it's a big chunk of xexpr but looks OK. On other pages, the code works as I'd expect. It's definitely something about stackoverflow pages. But I can't figure out what. And what should I do if I get a piece of almost OK xexpr like this and I can to clean it so that things like se-path* work?