4

I encountered an error caused by non-Latin-1 characters used in a given url using sbcl e.g.:

(drakma:http-request "http://www.youtube.com/„weird-url")

debugger invoked on a FLEXI-STREAMS:EXTERNAL-FORMAT-ENCODING-ERROR in thread
#<THREAD "initial thread" RUNNING {1002998D23}>:
  #\DOUBLE_LOW-9_QUOTATION_MARK (code 8222) is not a LATIN-1 character.

Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [ABORT] Exit debugger, returning to top level.

(FLEXI-STREAMS::SIGNAL-ENCODING-ERROR
 #<FLEXI-STREAMS::FLEXI-LATIN-1-FORMAT (:ISO-8859-1 :EOL-STYLE :LF)
   {1002F196E3}>
 "~S (code ~A) is not a LATIN-1 character."
 #\DOUBLE_LOW-9_QUOTATION_MARK
 8222)

Apparently Headers are defined to be sent in Latin-1 by RFC2616(this is the ticket I opended at github after encountering this error) and therefore the URL has to be properly encoded before being passed to drakma. But I have no clue how, as apparently it is impossible (as it is no LATIN-1 character) to do so?

What would be the working call for my example (besides the fact that the URL is bogus und could be shortened to http://www.youtube.com)?

(drakma:http-request (magic-encoding-function "http://www.youtube.com/„weird-url"))
Sim
  • 4,199
  • 4
  • 39
  • 77

2 Answers2

2

This problem don't concern DRAKMA. It's fault PURI. I use my fork of PURI: https://github.com/archimag/puri-unicode.

archimag
  • 76
  • 2
  • this still does not work correctly as the url that should then be transmitted would be: http://www.youtube.com/%e2%80%9eweird-url But the actual send out by Drakma is: GET /%25e2%2580%259eweird-url HTTP/1.1 – Sim Jun 07 '13 at 19:19
2

Just figured it out that if the flaw rests in post-processing of the newly-instantiated object then the work-around might be to split the process in two parts:

  1. Construct the URI with only the Latin-1 part.
  2. Set the path

It would be like:

(let ((uri (puri:uri "https://wikimedia.org"))) (setf (puri:uri-path uri) (concatenate 'string "/" (drakma:url-encode "/кадабра" :utf-8))) uri) Produces:

#<PURI:URI https://wikimedia.org/%D0%BA%D0%B0%D0%B4%D0%B0%D0%B1%D1%80%D0%B0>

Drakma then accepts this URI without any additional processing.

vityok
  • 21
  • 1