0

Can anyone tell me what's going on here?

#lang racket

(require (planet neil/html-parsing:2:0)
         net/url xml/path xml)

(define page "http://stackoverflow.com/questions/18902934/compile-and-run-eclipse-project-from-command-prompt")
(define port (get-pure-port (string->url page) ))
(define xexp (html->xexp port))

(displayln xexp)

(xexpr? xexp)

(define title (se-path* '(title) xexp))

(displayln title)

It pulls an html page down and seems to convert it into an xexpr. In the sense that html->xexp doesn't fail. And the result looks like I'd expect.

But (xexpr? xexp) returns #f and trying to call se-path* on it fails, with

 se-path*: contract violation
 expected: xexpr?
 given: '(*TOP* (*DECL* DOCTYPE html) "\n" (html (@ (itemscope) (itemtype "http://schema.org/QAPage")) "\n" (head "\n" "\n" (title "java - Compile and run Eclipse Project from command prompt - Stack Overflow") "\n" "    " (link (@ (rel "shortcut icon") (h...
 in: the 2nd argument of
      (-> se-path? xexpr? any/c)
 contract from: <collects>/xml/path.rkt
 blaming: anonymous-module
 at: <collects>/xml/path.rkt:74.2

So presumably (html->xexp port) is producing invalid xexpr.

How might I go about debugging this? Like I say, it's a big chunk of xexpr but looks OK. On other pages, the code works as I'd expect. It's definitely something about stackoverflow pages. But I can't figure out what. And what should I do if I get a piece of almost OK xexpr like this and I can to clean it so that things like se-path* work?

interstar
  • 26,048
  • 36
  • 112
  • 180

1 Answers1

2

Ah ... this seems to explain

http://www.neilvandyke.org/racket/sxml-intro/

sxml/xexp isn't the same as xexpr

But it seems a common confusion so I'll leave the question here in case anyone else comes across a similar problem.

interstar
  • 26,048
  • 36
  • 112
  • 180