4

I am writing a program which scrapes data from a list of websites and stores it into a struct called Listing which is then collected into a final struct called Listings.

use std::{ thread,
           sync::{ Arc, Mutex }
         };

fn main() {
    // ... some declarations
    let sites_count = site_list.len(); // site_list is a vector containing the list of websites

    // The variable to be updated by the thread instances ( `Listing` is a struct holding the information ) 
    let listings: Arc<Mutex<Vec<Vec<types::Listing<String>>>>> = Arc::new(Mutex::new(Vec::new()));

    // A vector containing all the JoinHandles for the spawned threads
    let mut fetch_handle: Vec<thread::JoinHandle<()>> = Vec::new();

    // Spawn a thread for each concurrent website
    for i in 0..sites_count { 
        let slist = Arc::clone(&site_list);
        let listng = Arc::clone(&listings);
        fetch_handle.push(
            thread::spawn(move || {
                println!("⌛ Spawned Thread: {}",i);
                let site_profile = read_profile(&slist[i]);
                let results = function1(function(2)) // A long list of functions from a submodule that make the http request and parse the data into `Listing`
                listng.lock().unwrap().push(results);
            }));
    }
    
    for thread in fetch_handle.iter_mut() { 
        thread.join().unwrap();
    }

    // This is the one line version of the above for loop - yields the same error.
    // fetch_handle.iter().map(|thread| thread.join().unwrap()); 

    // The final println to just test feed the target struct `Listings` with the values
    println!("{}",types::Listings{ date_time: format!("{}", chrono::offset::Local::now()),
                                   category: category.to_string(),
                                   query: (&search_query).to_string(),
                                   listings: listings.lock().unwrap() // It prevents me from owning this variable
                                 }.to_json());
}

To which I stumble upon the error

error[E0507]: cannot move out of `*thread` which is behind a mutable reference
   --> src/main.rs:112:9
    |
112 |         thread.join().unwrap();
    |         ^^^^^^ move occurs because `*thread` has type `JoinHandle<()>`, which does not implement the `Copy` trait

It prevents me from owning the variable after the thread.join() for loop.

When I tried assigning to check the output type

let all_listings = listings.lock().unwrap()

all_listings reports a type of MutexGuard(which is also true inside the thread for loop, but it allows me to call vector methods on it) and wouldn't allow me to own the data. I changed the data type in the Listings struct to hold a reference instead of owning it. But it seems so the operations I perform on the struct in .to_json() require me to own its value. The type declaration for listings inside the Listings Struct is Vec<Vec<Listing<T>>.

This code however works just fine when I move the .join().unwrap() to the end of thread::spawn() block or apply to its handle inside the for loop(whilst disabling the external .join() ). But that makes all the threads execute in a chain which is not desirable, since the main intention of using threads was to execute same functions with different data values simultaneously.

I am quite new to Rust in general(been 3 weeks since I am using it) and its my first time ever implementing Multithreading. I have only ever written single threaded programs in java and python before this, so if possible be a little noob friendly. However any help is appreciated :) .

Kevin Anderson
  • 6,850
  • 4
  • 32
  • 54
Rakshit Shetty
  • 109
  • 1
  • 11
  • 4
    change `for thread in fetch_handle.iter_mut() { ` into `for thread in fetch_handle.into_iter() { ` – PiRocks Aug 28 '21 at 18:07
  • 1
    I'm not experienced enough to figure out exactly what you have to do, but @PiRocks 's suggestion will probably work because the `JoinHandle<()>` is then moved into the current iteration entirely. Your way *probably* doesn't work because when you `join` and `unwrap` (I'm not sure which part is more important), what's left in the vector? This is a point of lifetimes I myself am unsure on, but regardless, that's probably the "source" of the error. `into_iter` "destroys" the original vector, allowing unambiguous ownership post-join. I'm not sure of the right way to do what you want. – Kevin Anderson Aug 28 '21 at 19:07
  • 3
    Just `for thread in fetch_handle` should work because `Vec` implements `IntoIterator`. You only need `into_iter()` for the one-line version. – trent Aug 28 '21 at 20:57
  • 1
    See also [How to take ownership of T from Arc>?](https://stackoverflow.com/questions/29177449/how-to-take-ownership-of-t-from-arcmutext) You probably don't want to use `.lock().unwrap()` at the end there because `lock` borrows the item inside the mutex instead of moving it out (as `into_inner` does). – trent Aug 28 '21 at 21:06
  • @PiRocks yes that seems to fix the problem altogether. @Kevin okay. I havent yet gotten into any of the into* methods. I don't fully understand what it does yet but I will look it up. @trentcl I tried `into_iter()` in the one-line version but it pops up with a warning `Unused Map that must be used, iterators are lazy and do nothing until consumed`. @trentcl ok. So only use `.lock().unwrap()` when inside a thread and use the `try_unwrap()` and `.into_inner()` combo when mutex is no longer needed, got it. Thank You @PiRocks @Kevin and @trentcl ! – Rakshit Shetty Aug 29 '21 at 07:45

2 Answers2

8

I figured out what needed to happen. First, for this kind of thing, I agree that into_iter does what you want, but it IMO it obscures why. The why is that when you borrow on it, it doesn't own the value, which is necessary for the join() method on the JoinHandle<()> struct. You'll note its signature takes self and not &mut self or anything like that. So it needs the real object there.

To do that, you need to get your object out of the Vec<thread::JoinHandle<()>> that it's inside. As stated, into_iter does this, because it "destroys" the existing Vec and takes it over, so it fully owns the contents, and the iteration returns the "actual" objects to be joined without a copy. But you can also own the contents one at a time with remove as demonstrated below:

while fetch_handle.len() > 0 {
    let cur_thread = fetch_handle.remove(0); // moves it into cur_thread
    cur_thread.join().unwrap();
}

This is instead of your for loop above. The complete example in the playground is linked if you want to try that.

I hope this is clearer on how to work with things that can't be copied, but methods need to fully own them, and the issues in getting them out of collections. Imagine if you needed to end just one of those threads, and you knew which one to end, but didn't want to end them all? Vec<_>::remove would work, but into_iter would not.

Thank you for asking a question which made me think, and prompted me to go look up the answer (and try it) myself. I'm still learning Rust as well, so this helped a lot.

Edit:

Another way to do it with pop() and while let:

while let Some(cur_thread) = fetch_handle.pop() {
    cur_thread.join().unwrap();
}

This goes through it from the end (pop pulls it off of the end, not the front), but doesn't reallocate or move the vector contents via pulling it off the front either.

Kevin Anderson
  • 6,850
  • 4
  • 32
  • 54
  • 2
    This clears it up even better than my own answer. I'll accept this answer. – Rakshit Shetty Aug 30 '21 at 08:23
  • 1
    I would personally use `for cur_thread in fetch_handle.drain(..)`. But there are many ways to Rome. – Arthur Jan 11 '22 at 09:56
  • 1
    @Arthur If using `.drain(..)` then you may as well just use `into_iter`. Drain sounds good if you're only taking part of it, but I don't see the benefit of that if you're draining the entire thing. Maybe because it's a usable `Vec` after `drain` is used whereas `into_iter` does a move on the original `Vec`, but that seems like an iffy case in general, though may come up in some specific cases. – Kevin Anderson Jan 12 '22 at 15:32
  • 1
    @KevinAnderson I think it's more explicit. You can more easily tell from just the name, without touching the documentation, that the elements should be gone after the loop is done. Which means that the loop _could_ take ownership, which means that it probably does take ownership. Then one can read the documentation and see that this is indeed the case. – Arthur Jan 12 '22 at 16:18
3

Okay so the problem as pointed out by @PiRocks seems to be in the for loop that joins the threads.

 for thread in fetch_handle.iter_mut() {
        thread.join().unwrap();
    }

The problem is the iter_mut(). Using into_iter() instead

 for thread in fetch_handle.into_iter() {
        thread.join().unwrap();
    }

yields no errors and the program runs across the threads simultaneously as required.

The explanation to this, as given by @Kevin Anderson is:

Using into_iter() causes JoinHandle<()> to move into the for loop.

Also looking into the docs(std::iter) I found that iter() and iter_mut() iterate over a reference of self whereas into_iter() iterates over self directly(owning it).

So iter_mut() was iterating over &mut thread::JoinHandle<()> instead of thread::JoinHandle<()>.

Rakshit Shetty
  • 109
  • 1
  • 11