9

I fully understand this question has been asked a lot, but I'm asking for a specific variation and my search-foo has given up, as I've only found algorithms that append one existing vector to another, but not one returned to from a function.

I have this function that lists all files in a directory:

vector<string> scanDir( const string& dir )

which may call itself internally (for subdirectories).

I need a short way of appending the returned value to the caller's vector. I have in my mind something like this (but of course it doesn't exist :( ):

vector<string> fileList;
//...
fileList.append( scanDir(subdirname) );

I fear that storing the return value and inserting it in fileList would bring performance badness. What I mean is this:

vector<string> temp( scanDir(subdirname) );
copy( temp.begin(), temp.end(), back_inserter(fileList) );

Thanks!

PS: I'm not forcing myself to using vector, any other container that performs equally well and can prevent the potential large copy operation is fine by me.

rubenvb
  • 74,642
  • 33
  • 187
  • 332
  • 1
    Related: http://stackoverflow.com/questions/2208293/what-is-the-most-efficient-way-to-append-one-stdvector-to-the-end-of-another – kennytm Jul 20 '10 at 19:23
  • 4
    When talking about performance, the first question is always: have you measure? Then the trivial ones, is this being run in a tight loop, is your application performance critical? How will copying a vector compare with retrieving a list of files from the filesystem? – David Rodríguez - dribeas Jul 20 '10 at 19:38

10 Answers10

16

Why not just pass the vector as an argument? Then every invocation can append to the same vector, without copying. Or create an implementation class which accumulates the elements into a member object.

Philipp
  • 48,066
  • 12
  • 84
  • 109
11

If you're in the position to change scanDir, make it a (template) function accepting an output iterator:

template <class OutIt>
void scanDir(const std::string& dirname, OutIt it) {
  // ...
  // Scan subdir
  scanDir(subdir, it);
  // ...
}

You'll have the additional benefit to be able to fill all sort of data structures like

std::vector<string> vector;
scanDir(dir1, std::back_inserter(vector));
std::set<string> fileset
scanDir(dir1, std::inserter(fileset, fileset.begin()));

etc.

EDIT (see comment ...)

For using this function for class member initialization, you could either call it in the constructor as in

class MyClass {
private:
  std::vector<string> m_fileList;
public:
  MyClass(const std::string& dirname) {
    scanDir(dirname, std::back_inserter(m_fileList);
  }
}

or using a wrapper function

std::vector<string> scanDir(const std::string& dirname) {
  std::vector<string> result;
  scanDir(dirname, std::back_inserter(result);
  return result;
}

class MyClass {
// Same as above..
  MyClass(const std::string& dirname) : m_fileList(scanDir(dirname)) { }
}

I would prefer the first version for performance (and other) reasons ...

MartinStettner
  • 28,719
  • 15
  • 79
  • 106
  • Could I use such a template to initialize a class data member (which holds the file list?)... oh wait, yours combined with sukru's is pretty powerful (second function to create the return value). – rubenvb Jul 20 '10 at 19:32
  • Just wrote an edit about this. I would suggest calling the function explicitely in the constructor, I cannot see any drawbacks of this approach ... – MartinStettner Jul 20 '10 at 19:41
  • It's generally called `OutputIterator` rather than `InsertIterator`. – Matthieu M. Jul 21 '10 at 07:19
  • Thanks for the great suggestion. I've got to start thinking more `iterator` than container. After some trouble splitting template declaration and implementation, I've got it right. – rubenvb Jul 21 '10 at 15:28
9

PS: I'm not forcing myself to using vector, any other container that performs equally well and can prevent the potential large copy operation is fine by me.

Well, if you use a list and call a.splice(a.end(), b); you'll avoid the copy operation completely. A list is generally going to be a linked list rather than an array as is the case with a vector, so this has a lot of performance and usage implications. But splice runs in O(1), so that's a nice benefit.

Philipp
  • 48,066
  • 12
  • 84
  • 109
Brian
  • 25,523
  • 18
  • 82
  • 173
3

Use std::list and append by using std::list::splice.

From the docs for splice:

The operation does not involve the construction or destruction of any element object and, except for the third version, it is performed in constant time.

Philipp
  • 48,066
  • 12
  • 84
  • 109
cape1232
  • 999
  • 6
  • 21
3
vector<string> fileList;
vector<string> temp( scanDir(subdirname) );

fileList.insert(fileList.end(), temp.begin(), temp.end());

I hope that helped you.

virious
  • 571
  • 1
  • 8
  • 27
  • insert is a linear time operation. The question specifically said your suggestion was not what he's looking for. "I fear that storing the return value and inserting it in fileList would bring performance badness." – cape1232 Jul 20 '10 at 20:03
  • "I fear", for me it means that he is not sure if this will cause performance badness. To be honest, I don't think that using insert() in this case will cause any performance issues. Anyways, thanks for the minus -,-. – virious Jul 21 '10 at 07:01
  • This answers the title of the question white well. So it should be easy to find this solution, when peole like me come here in search of a solution for just that. – kuester2000 Oct 19 '10 at 14:23
2

Instead of

vector<string> temp( scanDir(subdirname) );

you can do

vector<string> const& temp = scanDir(subdirname);

and proceed with the copy:

fileList.insert(fileList.end(), temp.begin(), temp.end());
Alexandre C.
  • 55,948
  • 11
  • 128
  • 197
  • Unlikely to make any difference, once copy constructor elision is finished with it. There is no requirement for the first line of code to do anything that the second line of code isn't also required to do. – Steve Jessop Jul 21 '10 at 00:14
0

The recursive function will have to copy everything multiple times, O(depth) to be precise (i.e: everything in the leaf level will be copied again and again until it reaches the root).

The best method would be splitting this into two different functions:

vector<string> scanDir(string path)
{
  vector<string> retval;

  scanDir(path, &retval);

  return retval;
}

static void scanDir(string path, vector<string>& output)
{
  .. scan
  .. append to output 
}
sukru
  • 2,229
  • 14
  • 15
  • in your first overload, a complete copy is of retval is guaranteed to be made when returning. – Alexandre C. Jul 20 '10 at 19:31
  • @alexandre: how would I avoid that when using this function to initialize a class data member? Wouldn't there be some compiler magic avoiding the copy? – rubenvb Jul 20 '10 at 19:35
  • @rubenvb: return value optimization takes place when you return with a constructor. Use a class instead, see MartinStettner's comment. – Alexandre C. Jul 20 '10 at 19:56
  • I always thought the return value optimization were specifically for this case. Thus I'm learning a new thing today. Is there a reference to check the details? – sukru Jul 23 '10 at 00:15
0

How about a helper function?

template<class T>
std::vector<T>& VectorAppend(std::vector<T> &target, const std::vector<T> &source)
{
    size_t insertPos = target.size();
    target.resize(target.size() + source.size());
    std::copy(source.begin(), source.end(), target.begin() + insertPos);
    return target;
}
David Gladfelter
  • 4,175
  • 2
  • 25
  • 25
  • What's wrong with a simple `std::copy(source.begin(), source.end(), std::back_inserter(target));` instead? If you're trying to optimize allocation, you could always call `target.reserve(target.size()+source.size());` before-hand. – sbi Jul 22 '10 at 09:02
-1

This might not be the simplest possible solution, but what about doing something equivalent to C#'s StringBuilder?

Make a list<vector<string> > then you can append all of the vectors that you get from your calls to scanDir() to the list.

If you absolutely have to have a single vector at the end, you can then, once make a new vector, allocate it large enough so that it won't need to resize, and assemble your finished product.

Alternatively, you could make a new class (if needed that derives from vector<T>) and internally uses the list<vector<T> > to store the elements. Then you would just make your iterators iterate through the elements in the first list, then when it reaches the end go for the elements in the next list, only returning container::end when you reached the end of the last list.

Andrew Shelansky
  • 4,932
  • 3
  • 20
  • 19
-1

I know this doesn't answer your question directly, but with regard to your underlying goal, you might want to simply reimplement your function in terms of boost::filesystem. The directory iterator is already recursive so you don't need to make recursive calls of your own. You could simply populate a list in a loop over the iterator. There is an example implementation of ls: http://www.boost.org/doc/libs/1_43_0/libs/filesystem/example/simple_ls.cpp

You also get the added benefit of (theoretical) platform independence, relatively wide adoption (bugs get ferreted out more quickly with more adoption), etc

frankc
  • 11,290
  • 4
  • 32
  • 49