0

This is a program that does a directory tree listing using asynchronous tasks in C++.

my problem is in each function call the variable 'vect' is created as a local variable and in each function call, we have a list of files in a directory but at the end all the files in all the directories are returned into the main! how is it possible?

I mean how come the 'vect' variable which is a local variable to each function call, keeps the file name of each directory generated by a separate function call?! this 'vect' acts like it is a global variable. Is it because of "std::copy"? I don't understand it!

#include <algorithm>
#include <filesystem>
#include <future>
#include <iostream>
#include <vector>

typedef std::vector<std::filesystem::directory_entry> vectDirEntry;

vectDirEntry ListDirectory2(std::filesystem::directory_entry&& dirPath)
{
    std::vector<std::future<std::vector<std::filesystem::directory_entry>>> finalVect;
    vectDirEntry vect;

    for (const std::filesystem::directory_entry& entry : std::filesystem::directory_iterator(dirPath))
    {
        if (entry.is_directory())
        {

            
            std::future<vectDirEntry> fut = std::async(std::launch::async, &ListDirectory2, entry);
            finalVect.push_back(std::move(fut));
        }
        else if (entry.is_regular_file())
        {

            vect.push_back(entry);

        }
    }

    std::for_each(finalVect.begin(), finalVect.end(), [&](std::future<std::vector<std::filesystem::directory_entry>>& fut)

        {
            vectDirEntry lst = fut.get();
            std::copy(lst.begin(), lst.end(), std::back_inserter(vect));
            
        }

    );
    return vect;
}


int main()
{

    const std::filesystem::directory_entry root = std::filesystem::directory_entry("C:/Test");
    std::future<std::vector<std::filesystem::directory_entry>> fut = std::async(std::launch::async, &ListDirectory2, root);
    auto result = fut.get();

    for (std::filesystem::directory_entry& item : result)
    {

        std::cout << item << '\n';

    }
}
Sami
  • 513
  • 4
  • 11
  • 3
    Probably won't get much extra speed out of this and might actually slow things down. All those threads will fight it out over access to the file system, so unless you've got a really groovy filesystem that can handle all of those threads, you are back to serial operation PLUS threading overhead PLUS any side effects from the fighting. – user4581301 Jan 31 '23 at 15:56
  • 1
    Well, your response is too OFF-TOPIC! It is just a practice by the way!! – Sami Jan 31 '23 at 15:58
  • 1
    Side-note: Why are you accepting `ListDirectory2` by r-value reference? You're not going to mutate it, you're not going to move from it. It doesn't even make sense given how you invoke it (you passed a `const` argument, so even if `std::async` didn't interfere with argument passing to disable implicit reference semantics, and I believe it does, like `std::thread` does, it *can't* work without a copy, since you told it to receive an r-value and passed it a `const` l-value). – ShadowRanger Jan 31 '23 at 16:02
  • 3
    @passionateProgrammer: There's no call for yelling when someone is trying to be helpful. Practice is all well and good, but you need to be aware your choice of practice exercise is not one you will ever want to use in real code (it introduces complexity for no benefit). – ShadowRanger Jan 31 '23 at 16:04
  • @ ShadowRanger how come std::copy has anything to do with passing a directory name by a r-value reference?! std::copy is using 'vect' variable! – Sami Jan 31 '23 at 16:08
  • 3
    @passionateProgrammer Some comments are there to help you improve and don't necessarily solve the issue you have. – Ted Lyngmo Jan 31 '23 at 16:09
  • 1
    I thought the whole purpose of the program was to collect files recursively since you've made it to take care of the results from the recursive calls and populate each `vect` above it. What _was_ the purpose of doing that if the result is surprising? – Ted Lyngmo Jan 31 '23 at 16:11
  • ... and then _"Is it because of "std::copy"?"_ - Yes, each `std::copy` copies the result from one `async` call into `vect`, like ShadowRanger explains below. – Ted Lyngmo Jan 31 '23 at 16:17
  • Funny note: I used to prefix my off topic comments with *Off topic:*, but I stopped when I found people thought I was saying their question was off topic. Usually I do what ShadowRanger did and prefix with *Side Note:* or *Warning:*. Not sure why I didn't this time. Apologies if I came off sounding judgmental. The comment was intended as a warning that if the goal was for a speed-up, the most likely reason to pull in threads, the results were going to be disappointing. – user4581301 Jan 31 '23 at 17:58
  • 1
    @user4581301 -- New posters should realize that the comment section is for comments, not answers. It's similar to this -- if you were to go to a swim instructor, and wanted to learn how to do a backstroke, but your overall swim technique was awful, wouldn't it be responsible for the instructor to point out the general flaws in your swimming that have nothing to do with the backstroke? To that -- there have been posters who have been appreciative of the comments that are "off-topic", enough to the point of solving their problem, but using a different approach that they were not thinking of. – PaulMcKenzie Jan 31 '23 at 18:20

2 Answers2

3

There is a separate vect for each recursive call. But you return it, and the future generated from std::async provides the vect from each call. When you do:

        vectDirEntry lst = fut.get();
        std::copy(lst.begin(), lst.end(), std::back_inserter(vect));

for each of the std::async dispatched futures, you consume their vects to populate the parent's vect (which it in turn returns).

The lst in that code is the vect returned by one of your recursive calls. The vect in that std::copy is the vect from the current ListDirectory2 call, implicitly received by reference (because you began the lambda definition with [&], which means any variables referenced that are not declared within the lambda are implicitly references to the variables in the outer scope).

There's nothing unusual here; you explicitly copied from the sub-vects into the parent vect before returning each time, eventually building up a final vect in the top-most ListDirectory2 call that contains the results from every recursive call.

As a side-note, you're performing a number of copies that aren't strictly necessary. You could avoid at least some of them by replacing your use of std::copy with std::move (in addition to the single argument version that makes an r-value reference from an l-value, there's a three-arg version equivalent to std::copy that moves from the source; since the lst argument expires at the end of each function call, there's no harm in emptying it). A similar change could be make using the insert method of vect and std::make_move_iterator (and might be slightly faster by allowing the vector to resize in bulk up-front for each bulk move), but the simple swap from std::copy to std::move is the minimalist solution and it should be fast enough.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Thanks for the response. I'm still confused. I mean in each recursive call, each function has its own 'std::for_each', its own local variable 'vect' and its own 'lst' variable, how come files that are generated in each separate recursive call and saved in a local variable 'vet', finally come together?! we don't have a reference to variable 'vect'. we didn't pass 'vect' as a reference. – Sami Jan 31 '23 at 16:20
  • 1
    @passionateProgrammer: Your lambda begins with `[&]`, which means "all variables from the current scope referenced within the lambda are exposed to the lambda by-reference". So the `vect` inside the lambda is a reference to the single `vect` for that invocation of `ListDirectory2`. The `for_each` populates that `vect` with the results from the recursive calls (which returned their own `vect`s; you extracted them to `lst`), and once `vect` is filled, you return it (either to a parent call when invoked recursively, or to the original outside caller when all recursive results are collected). – ShadowRanger Jan 31 '23 at 16:24
2

What you observe has nothing to do with async calls but is due to recursion.

Here's a flowchart describing it for 3 directory levels. Each vect is here given a unique name (and they are unique instances in the program).

ListDirectory2(dir)
vect <- file1.1   // put all files in dir in the local vect
        file1.2
dir1 ---------------> ListDirectory2(dir1) // call ListDirectory2 for each dir
                      vect1 <- file1.1 // put all files in dir1 in the local vect
                               file1.2
                      dir1.1 ---------------> ListDirectory2(dir1.1)
                                              ...
                      vect1 <- std::copy <--- return vect1.1
                      dir1.2 ---------------> ListDirectory2(dir1.2)
                                              ...
                      vect1 <- std::copy <--- return vect1.2
vect <- std::copy <-- return vect1

dir2 ---------------> ListDirectory2(dir2)
                      vect2 <- file2.1 // put all files in dir2 in the local vect
                               file2.2
                      dir2.1 ---------------> ListDirectory2(dir2.1)
                                              ...
                      vect2 <- std::copy <--- return vect2.1
                      dir2.2 ---------------> ListDirectory2(dir2.2)
                                              ...
                      vect2 <- std::copy <--- return vect2.2
vect <- std::copy <-- return vect2
return vect

When the call returns to main, vect will therefore be populated with all the files encountered from the starting directory and down.

Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
  • Thanks both of you for the great help. you're right about recursive making me confused. So when we reach the last function call where there is no directory, only a file , we skip for_each and that file will be copied into 'vect' and returned the 'vect' where the previous function call is waiting on 'fut.get()' to get the data. This time it won't skip the for_each then it will copy the returned data into the back of 'vect' and return the whole and so on. I think the answer to my question is not std::copy; it was the returned value from each recursive call when we get it at 'fut.get()'. right? – Sami Jan 31 '23 at 17:36
  • 1
    @passionateProgrammer You're welcome. It's the combination of returning the `vect` from each recursive call and then copying into the local `vect` that makes it work. – Ted Lyngmo Jan 31 '23 at 17:41