7

I have structs, let's call them sn, that look like:

struct sn {
    string name;
    vector<sn*> connected_to;
};

Now, suppose I have the connected_to vector already declared from 0 - 9; and I'm connecting sn A, to sn B:

A.connected_to[0] = &B; 

I have a feeling that I'm going about this the wrong way. Essentially what I'm trying to do is avoid copying the struct when I'm connecting the structs... i.e:

struct sn {
    string name;
    vector<sn> connected_to;
};

// ... 

A.connected_to[0] = B; 

Does this copy anything? The more fundamental problem is of course I don't understand how vectors, pointers, and referencing really works deeply.

tshepang
  • 12,111
  • 21
  • 91
  • 136
Deniz
  • 1,481
  • 3
  • 16
  • 21
  • 2
    Trying things out is usually a good way to learn how they really work. I can handle simple & and * with primitive types, and passing them by reference etc. to functions -- but the above requires understanding how struct and vector is implemented etc to some degree =) – Deniz Dec 23 '11 at 09:31
  • To learn I suggest you implement minimal test cases of the different approaches you consider; use gdb or another debugger to run these minimal examples and watch how they behave in detail -- than you can make an informed decision. – Tom Regner Dec 23 '11 at 09:34

2 Answers2

12

Your second approach is probably illegal. However, in some implementations of the standard library it might work. In those cases, the objects you add would get copied (including all their children - when a standard container is copied, all the elements it contains are copied too). Thus, such a data-structure would only be suitable for representing a tree.

Your first approach, on the other hand, is fine, because a pointer to an incomplete type is itself a valid type (§3.9.2/3 - [basic.compound]). Since you only store a pointer, the object will not get copied. You have to take care though when you start deleting this graph. Depending on what type of graph you are modeling, there are three scenarios when implementing them:

There are some restrictions. Note that in your case the type is only incomplete inside the definition (of sn) - at the point you actually use it, sn is complete, hence you can also delete it then.

Trees

In case of a tree, every child has exactly one parent. Thus, when deleting the structure, you would start from the the root, and each node would just have to delete all of its children. This would work recursively to the leaves, which have no children.

To implement this efficiently, you could store the children in a boost::ptr_vector<sn>. Thus, you will not have to write a destructor yourself - the ptr_vector will delete all its elements.

Directed Acyclic Graphs (DAG)

In a DAG a node can have multiple parents, thus you have to take care not to delete the same node twice (this would happen if each node just deletes all its children - for this reason, a ptr_vector will not work here). One way to handle this is to use reference-counting - each nodes counts how many other nodes point to it, and only when the reference-count reaches zero, the node is actually deleted. You can automate this by storing the nodes in a std::vector<std::shared_ptr<sn> > (or boost::shared_ptr if you use a pre-C++11 compiler). The shared_ptr manages the reference-counting internally, and will only delete the object it points to when there are no more shared_ptr-instances pointing to that object (when the reference-count is zero).

Cyclic Graphs

In a cyclic graph, a node can also be its own parent (either directly, if it contains loops, or indirectly, through a cycle). Thus, if each node deletes all its children, that would result in an infinite loop of destructor-calls. A shared_ptr could also fail here, because when you have a cycle of shared_ptr referencing each other, their reference-count will never reach zero. Now it is time to think about the difference between owning an object and referencing it. Each node should have exactly one parent that owns it, but can have several that reference it. The owner, and only the owner, is responsible for deleting that node. As explained in the excellent answer I linked above, this can be implemented using a combination of shared_ptr and weak_ptr.

Community
  • 1
  • 1
Björn Pollex
  • 75,346
  • 28
  • 201
  • 283
  • could you comment a bit on how I can take care when deleting / how to delete? thank you :) how about the second approach? are things copied in that case? – Deniz Dec 23 '11 at 09:34
  • Further to this: You want to determine who owns the structs - i.e. whos responsibility is it to delete them when you're done. Often this can be a larger structure containing all the structs. – Michael Anderson Dec 23 '11 at 09:34
  • @Deniz: I do not have the time now, but I will get back to this answer later (of not somebody else does so first). – Björn Pollex Dec 23 '11 at 09:41
  • Thanks @BjörnPollex , looking forward to it. – Deniz Dec 25 '11 at 02:56
  • 1
    @Deniz: I updated my answer - hope it helps. I did not add any code-examples, as that would have become to long - I hope it helps you to decide which approach to use though. If you encounter problems with that then, you can ask a specific question about that. – Björn Pollex Dec 26 '11 at 21:31
  • much appreciated -- thank you very much for the time & effort. – Deniz Dec 26 '11 at 21:49
  • (I do a topological sort on my DAG & keep all of them in one "collection" of nodes as well to keep track of them) – Deniz Dec 26 '11 at 21:50
3

A.connected_to[0] = &B;

Does copy something: the temporary pointer value of the expression &B.

The vector template class will always do automatic copy construction and destruction, but copy construction of a primitive type is equivalent to assignment and destruction of a primitive type, including pointers, is a no-op.

A pointer is a very basic type - almost nothing is automatically done for you when using pointers. Under the hood it's just an integral value corresponding to an address in memory. When you dereference a pointer, the compiler simply trusts you that the pointer hold the address of (or "points at") an object of the correct type.

For example, given classes Foo and Bar which are not related by inheritance:

Foo *ptr1, *ptr2;
Bar *ptr3;
  // All pointers are uninitialized. 
  // Dereferencing them is undefined behavior. Most likely a crash.
  // The compiler will almost certainly issue a warning.
ptr1= new Foo(); // ptr1 now points to a valid Foo.
ptr2 = ptr1; // ptr2 points to the same Foo.
ptr3=(Bar*)ptr1; // This is an obvious programmer error which I am making here for demonstration.
 // ptr3 points to the same block of memory as ptr1 & 2.
 // Dereferencing it is likely to do strange things.
delete ptr1; // The compiler is allowed to set ptr1 to 0, but you can't rely on it.
  // In either case dereferencing ptr1 is once again undefined behavior
  // and the value of ptr2 is unchanged.

The compiler is much less likely to issue a warning if it sees ptr1 dereferenced after deletion than before initialization. It will virtually never issue a warning if you dereference ptr2 after deleting the object through ptr1. If you are not careful as others have warned you, your vector of pointers may lead you to inadvertently invoke undefined behavior in this way.

I introduced the extremely wrong cast of a Foo* to a Bar* to illustrate the compiler's absolute trust in you. The compiler allows you to do it and will happily treat the bits as if they were a Bar when you dereference ptr3.

The C++ standard library provides few template classes that offer pointer-like behavior with much more automatic safety. For example, the std::shared_pointer:

std::shared_ptr is a smart pointer that manages lifetime of an object, typically allocated with new. Several shared_ptr objects may manage the same object; the object is destroyed when the last remaining shared_ptr pointing to it is destroyed or reset. The object is destroyed using delete-expression or a custom deleter that is supplied to shared_ptr during construction.

If your environment does not yet provide the c++11 standard library, it might provide either the boost library or the std::tr1:: namespace. Both provide a very similar shared_ptr. std::auto_ptr, which you are sure to have, is similar but allows only one auto_ptr to refer to an object at a given time. (C++11 introduces std::unique_ptr as an intended replacement for auto_ptr. The auto_ptr is incompatible with most std template containers. unique_ptr can be placed in a template container with std::move.)

It is possible to break either of these classes by keeping or obtaining the pointer and monkeying around with it, e.g.

Foo *basic_ptr=new Foo();
std::auto_ptr<Foo> fancy_ptr(basic_ptr);
delete basic_ptr; // Oops! This statement broke our auto_ptr.

You will also break them if you pass in an address of a variable in automatic storage:

Foo aFoo;
std::auto_ptr<Foo> fancy_ptr(&aFoo); // automatic storage automatically breaks auto_ptr

If you just do std::shared_ptr<sn> fresh_sn(new sn()) and then use a std::vector< std::shared_ptr<sn> > you'll be fine.

01d55
  • 1,872
  • 13
  • 22
  • 1
    +1: Good answer. Two notes: 1) the *extremely wrong* cast from `Foo*` to `Bar*` is only accepted by the compiler because the example uses an unsafe C-Style cast. For this very reason it is considered good practice to use C++-style casts - `static_cast` would have resulted in a compiler-error here. 2) Some older compilers also provide `std::tr1::shared_ptr`, even without C++11-support. – Björn Pollex Dec 25 '11 at 10:08
  • @Xeo It is now. Thanks for the heads-up. – 01d55 Dec 27 '11 at 07:25