1

I am rewriting part of my API from python to rust. In particular, I am trying to make an HTTP request to OSRM server to get a big distance matrix. This kind of request can have quite large URLs. In python everything works fine, but in rust I get an error: thread 'tokio-runtime-worker' panicked at 'a parsed Url should always be a valid Uri: InvalidUri(TooLong)'

I have tried to use several HTTP client libraries: reqwest, surf, isahc, awc. But it turns out that constraining logic is located at the URL processing library https://github.com/hyperium/http and most HTTP clients depend on this library. So they behave the same. I could not use some libs, for example with awc I got compile-time errors with my async code.

Is there any way to send a large GET request from rust, preferably asynchronously?

Dimitrius
  • 564
  • 6
  • 21
  • 6
    Btw, such long URIs are not supported by vast majority of software around the world. Most browsers limit it to 2000 chars. Most CDNs accept at most 16kb. You are going way beyond accepted standards, I advice rethinking that design. Why is that URI so long to begin with? Why can't you move some of that data to headers or body? – freakish Feb 02 '23 at 12:35
  • @freakish thank you for commenting. This is the problem with OSRM API. They used to support POST requests, but then they removed POST support. They recommend using libosrm instead of the server, but this is quite problematic architectural change https://github.com/Project-OSRM/osrm-backend/issues/4211 – Dimitrius Feb 02 '23 at 12:42
  • 1
    @Dimitrius you might find happiness in the `curl` crate itself, I would expect it to have a wider compatibility range for odd situations. – Masklinn Feb 02 '23 at 14:03
  • 1
    See also: https://github.com/hyperium/http/issues/462 – Masklinn Feb 02 '23 at 14:05
  • @Masklinn the `curl` crate works! Unfortunately, it only provides blocking API and this request can hang in my event loop for minutes if not hours. But anyway this is very helpful, thank you. – Dimitrius Feb 02 '23 at 15:14

1 Answers1

4

As freakish pointed out in the comments already, having such a long URL is a bad idea, anything longer than 2,000 characters won't work in most browsers.

That being said: In the comments, you stated that an external API wants those crazily long URIs, so you don't really have an alternative. Therefore, let's give this problem a shot.


It looks like the limitation to 65.534 bytes is because the http library stores the position of the query string as a u16 (and uses 65,535 if there is no query part). The following patch seems to make the code use u32 instead, thereby raising the number of characters to 4,294,967,294 (if you've got longer URIs than that, you might be able to use u64 instead, but that would be an URI of a length greater than 4 GB – I doubt you need this):

--- a/src/uri/mod.rs
+++ b/src/uri/mod.rs
@@ -141,7 +141,7 @@ enum ErrorKind {
 }
 
 // u16::MAX is reserved for None
-const MAX_LEN: usize = (u16::MAX - 1) as usize;
+const MAX_LEN: usize = (u32::MAX - 1) as usize;
 
 // URI_CHARS is a table of valid characters in a URI. An entry in the table is
 // 0 for invalid characters. For valid characters the entry is itself (i.e.
diff --git a/src/uri/path.rs b/src/uri/path.rs
index be2cb65..9abec4c 100644
--- a/src/uri/path.rs
+++ b/src/uri/path.rs
@@ -11,10 +11,10 @@ use crate::byte_str::ByteStr;
 #[derive(Clone)]
 pub struct PathAndQuery {
     pub(super) data: ByteStr,
-    pub(super) query: u16,
+    pub(super) query: u32,
 }
 
-const NONE: u16 = ::std::u16::MAX;
+const NONE: u32 = ::std::u32::MAX;
 
 impl PathAndQuery {
     // Not public while `bytes` is unstable.
@@ -32,7 +32,7 @@ impl PathAndQuery {
                 match b {
                     b'?' => {
                         debug_assert_eq!(query, NONE);
-                        query = i as u16;
+                        query = i as u32;
                         break;
                     }
                     b'#' => {

You could try to get this merged, however the issue covering this problem sounds like a pull request might not be accepted. Depending on your use case, you could fork the repository, commit the fix and then use the Cargo features for overriding dependencies to make Cargo use your patched version instead of the version in the repositories. The following addition to your Cargo.toml might get you started:

[patch.crates-io]
http = { git = 'https://github.com/your/repository' }

Note however that this only overrides the current version of the Uri crate – as soon as a new version of the original crate is published, it will probably be chosen by Cargo until you update your fork.

Elias Holzmann
  • 3,216
  • 2
  • 17
  • 33
  • 1
    "having such a long URL is a bad idea, anything longer than 2,000 characters won't work in most browsers." FWIW that's a historical limit of MSIE, most browsers actually have much higher limit. Firefox and legacy opera had unlimited URL sizes (though the Firefox URL bar would stop displaying URLs longer than 64k), Safari is limited to ~80k, and Chrome is a megabyte or two. Most web servers have a default configuration at 4k to 8k (technically it's the limit of the request line so excludes the domain but includes the HTTP/1.1 stricture). – Masklinn Feb 02 '23 at 16:43
  • Here again Microsoft is the odd duck, as IIS has default a limit of 2k matching MSIE I think (and it used to be much, much lower, something like 256 or 512 bytes). – Masklinn Feb 02 '23 at 16:44