0

I am trying to read a series of web pages with CLISP, if they exist, but I don't understand how open-http works to skip non existing web pages. I have the following:

(dolist (word '(a b c))
  (with-open-stream (stream (ext:open-http
                              (format nil
                                      "https://en.wikipedia.org/wiki/~a.html"
                                      word)
                              :if-does-not-exist nil))
    (when stream
      (print word))))

I want to simply skip a web-page if it doesn't exist, but CLISP seems to hang and returns an "Invalid argument" error. Could anyone explain how the argument :if-does-not-exist works and/or provide examples of how to use open-http. Thanks!

Rainer Joswig
  • 136,269
  • 10
  • 221
  • 346
Leo
  • 1,869
  • 1
  • 13
  • 19
  • I think you are going to have problems with other things in the process: almost all websites use HTTPS nowadays, CLISP's `OPEN-HTTP` doesn't support HTTPS. Also, as far as I can see, Wikipedia's articles are not placed in *.html files, and other files have more complicated paths. – mobiuseng Mar 15 '16 at 07:21
  • Related: [Is there a Wikipedia API?](http://stackoverflow.com/q/627594/124319). Also, have a look at [Drakma](http://weitz.de/drakma/) – coredump Mar 15 '16 at 08:54
  • Thanks for the replies so far, but please focus on the question: how do you prevent open-http from hanging when the page does not exist, irrespective of its url. An example would suffice. – Leo Mar 15 '16 at 12:39

1 Answers1

1

It does work for me:

(with-open-stream (stream (ext:open-http
                           "http://stackoverflow.com/questions/234242424242"
                           :if-does-not-exist nil))
(format t "~&Stream: ~A~%" stream))

Output:

;; connecting to "http://stackoverflow.com/questions/234242424242"...connected...HTTP/1.1 404 Not Found
;; HTML source of Page not found
Stream: NIL
NIL

There is a delay to get the connection, but it works.

If the page does exist:

[7]> (with-open-stream (stream (ext:open-http
                                "http://stackoverflow.com/questions/36003343/clisp-open-http-example"
                                :if-does-not-exist nil))
       (format t "~&Stream: ~A~%" stream))
;; connecting to "http://stackoverflow.com/questions/36003343/clisp-open-http-example"...connected...HTTP/1.1 200 OK
Stream: #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER stackoverflow.com:80>
NIL

With Wikipedia I couldn't make it work since Wikipedia.org re-directs it to HTTPS and EXT:OPEN-HTTP neither can handle HTTPS directly, nor it can handle redirects:

Here if HTTPS is used directly:

[10]> (with-open-stream (stream (ext:open-http
                                 "https://en.wikipedia.org/wiki/Common_Lisp"
                                 :if-does-not-exist nil))
        (format t "~&Stream: ~A~%" stream))

*** - OPEN-HTTP: "https://en.wikipedia.org/wiki/Common_Lisp" is not an HTTP URL
The following restarts are available:
ABORT          :R1      Abort main loop
Break 1 [11]> :r1

If "https" is replaced by "http", CLISP doesn't construct a proper address:

[12]> (with-open-stream (stream (ext:open-http
                                 "http://en.wikipedia.org/wiki/Common_Lisp"
                                 :if-does-not-exist nil))
        (format t "~&Stream: ~A~%" stream))
;; connecting to "http://en.wikipedia.org/wiki/Common_Lisp"...connected...HTTP/1.1 301 TLS Redirect --> "https://en.wikipedia.org/wiki/Common_Lisp"
;; connecting to "http://en.wikipedia.orghttps://en.wikipedia.org/wiki/Common_Lisp"...
*** - PARSE-INTEGER: substring "" does not have integer syntax at position 0
The following restarts are available:
ABORT          :R1      Abort main loop
Break 1 [13]> 
mobiuseng
  • 2,326
  • 1
  • 16
  • 30