I'm using Rust to download huge amounts of stock market data, around 50,000 GET requests per cycle. To make the process go significantly faster, I've been able to use multithreading. My code so far looks like this:
// Instantiate a channel so threads can send data to main thread
let (s, r) = channel();
// Vector to store all threads created
let mut threads = Vec::new();
// Iterate through every security in the universe
for security in universe {
// Clone the sender
let thread_send = s.clone();
// Create a thread with a closure that makes 5 get requests for the current security
let t = thread::spawn(move || {
// Download the 5 price vectors and send everything in a tuple to the main thread
let price_vectors = download_security(&security);
let tuple = (security, price_vectors.0, price_vectors.1, price_vectors.2, price_vectors.3, price_vectors.4);
&thread_send.send(tuple).unwrap();
});
// PAUSE THE MAIN THREAD BECAUSE OF THE ERROR I'M GETTING
thread::sleep(Duration::from_millis(20));
// Add the new thread to the threads vector
threads.push(t);
};
drop(s);
// Join all the threads together so the main thread waits for their completion
for t in threads {
t.join();
};
The download_security()
function that each thread calls simply makes 5 GET requests to download price data (minutely, hourly, daily, weekly, monthly data). I'm using the ureq
crate to make these requests. The download_security()
function looks like this:
// Call minutely data and let thread sleep for arbitrary amount of time
let minute_text = ureq::get(&minute_link).call().unwrap().into_string().unwrap();
thread::sleep(Duration::from_millis(1000));
// Call hourly data and let thread sleep for arbitrary amount of time
let hour_text = ureq::get(&hour_link).call().unwrap().into_string().unwrap();
thread::sleep(Duration::from_millis(1000));
// Call daily data and let thread sleep for arbitrary amount of time
let day_text = ureq::get(&day_link).call().unwrap().into_string().unwrap();
thread::sleep(Duration::from_millis(1000));
// Call weekly data and let thread sleep for arbitrary amount of time
let week_text = ureq::get(&week_link).call().unwrap().into_string().unwrap();
thread::sleep(Duration::from_millis(1000));
// Call monthly data and let thread sleep for arbitrary amount of time
let month_text = ureq::get(&month_link).call().unwrap().into_string().unwrap();
thread::sleep(Duration::from_millis(1000));
Now, the reason I'm putting my threads to sleep throughout this code is because it seems that whenever I make too many HTTP requests too fast, I get this strange error:
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Transport(Transport { kind: Dns, message: None, url: Some(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.polygon.io")), port: None, path: "/v2/aggs/ticker/SPHB/range/1/minute/2021-05-22/2021-10-22", query: Some("adjusted=true&sort=asc&limit=200&apiKey=wo_oZg8qxLYzwo3owc6mQ1EIOp7yCr0g"), fragment: None }), source: Some(Custom { kind: Uncategorized, error: "failed to lookup address information: nodename nor servname provided, or not known" }) })', src/main.rs:243:54
When I increase the amount of time that my main thread sleeps after creating a new subthread, OR the amount of time that my subthreads sleep after making each of the 5 GET requests, the number of these errors goes down. When the sleeps are too short, I'll see this error printed out for 90%+ of the securities I try to download. When the sleeps are longer, everything works perfectly, except that the process takes WAY too long. This is frustrating because I need this process to be as fast as possible, preferably <1 minute for all 10,000 securities.
I'm running macOS Big Sur on an M1 Mac Mini. Is there some kind of fundamental limit on my OS as to how many GET requests I can make per second?
Any help would be greatly appreciated.
Thank you!