0

I'm not facing a "problem" actually, as my code does work. I'm just curious about whether my implementations are reasonable and riskless.

I've been working on a project using C++, in which I first parse a file and then build a directed-acyclic-graph structure accordingly. Each nodes may have 0~2 out-neighbors depending on the type of the node. For different types of nodes, some functions for printing and accessing are needed, and I decided to do it using polymorphism.

My first trial was to implement it with nodes storing pointers to its out-neighbors.

class Base{
public:
  Base(){}
  virtual ~Base(){}
  virtual foo()=0;
  // ...
protected:
  unsigned _index;
}

class Derived1: public Base{
public:
  foo(){ /*Do something here...*/ }
private:
  Base* _out1;
}

class Derived2: public Base{
public:
  foo(){ /*Do something different here...*/ }
private:
  Base* _out1;
  Base* _out2;
}

int main(){
  std::vector<Base*> _nodeList;
  for(/*during parsing*/){
    if(someCondition){
      _nodeList.pushback(new Derived1);
    }
    // ...
  }
}

Since the out-neighbor of a node may be yet to define when the node is constructed, I have to add some tricks to first remember id of the out-neighbors and connect them after finishing the construction of all nodes.

However, since the number of nodes are determined given the file to parse and will not grow ever after, I consider it better to store all nodes contiguously and each node store the indices of its out-neighbors instead of pointers. This allows me to skip the connection part and also brings some minor benefits to other parts.

My current version is as follows:

// Something like this
class Base{
public:
  Base(){}
  virtual ~Base(){}
  virtual foo()=0;
  // ...
protected:
  unsigned _index;
  unsigned _out1;
  unsigned _out2;
}

class Derived1: public Base{
public:
  foo(){ /*Do something here...*/ }
}

class Derived2: public Base{
public:
  foo(){ /*Do something a little bit different here...*/ }
}

int main(){
  // EDITED!!
  // Base* _nodeList = new DefaultNode[max_num];
  Base* _nodeList = new Derived2[max_num];
  for(/*during parsing*/){
    if(someCondition){
      // EDITED!!
      // _nodeList[i] = Derived1;
      new(_nodeList+i) Derived1();
    }
    // ...
  }
}

My questions

  1. Are there any risks to store objects of different class in a newed array, given that they are all of the same size and can be destructed using a virtual destructor?

  2. I've always heard that the use of new[] should be avoided. I did found some approaches that achieve what I want using vector of union with a type tag, but it seems somewhat dirty to me. Is there a way to achieve polymorphism while storing data in a std::vector?

  3. Is the practice of using polymorphism merely to make use of the convenience of virtual functions consider a bad habit? By saying so I mean if the memory taken by each object is already the same for each derived class, then they may be merged into one single class that store its own type, and each member function can just behave according to its own type. I chose not to do so since it also looks dirty to me to have huge switch structure in each member function.

  4. Is it good to choose contiguous memory in this case? Are there any reasons that such choice may be harmful?

EDIT:

It turns out that I'm making many mistakes such as asking too many questions at a time. I think I'll first focus on the part with polymorphism and placement new. The following is a testable program of what I mean by "storing objects of different derived classes in an newed array, and it behaves on my laptop as shown below.

#include <iostream>

class Base{
public:
  Base(){}
  virtual ~Base(){}
  void virtual printType() =0;
};

class Derived1: public Base{
public:
  Derived1(){}
  void printType(){ std::cout << "Derived 1." << std::endl; }
};

class Derived2: public Base{
public:
  Derived2(){}
  void printType(){ std::cout << "Derived 2." << std::endl; }
};

int main(){
  Base* p = new Derived1[5];
  new(p+2) Derived2();
  for(unsigned i = 0; i < 5; ++i){
    (p+i)->printType();
  }
}

Outcome:

Derived 1.
Derived 1.
Derived 2.
Derived 1.
Derived 1.

Again, thanks for all the feedbacks and suggestions.

tropurchan
  • 13
  • 4
  • 1
    You'd better post it on Code Review - stackoverflow is dedicated to error handling while code review is what you look for. – ALX23z Dec 09 '19 at 10:41
  • So the derived classes **never** add extra member variables? If so why use the derived classes at all rather than a single node type? – Fire Lancer Dec 09 '19 at 10:41
  • 3
    1) What is `DefaultNode`? 2) `_nodeList[i] = Derived1;` is not valid C++, no matter what. – Sam Varshavchik Dec 09 '19 at 10:43
  • @ALX23z Well the short answer is that no, you can't "Are there any risks to store objects of different class in an newed array". – Fire Lancer Dec 09 '19 at 10:43
  • Check out `std::variant` if you want to store different types in contiguous memory. You can use `std::visit` for polymorphism – parktomatomi Dec 09 '19 at 10:44
  • @ALX23z Thanks for the suggestions. I'll check it out. – tropurchan Dec 09 '19 at 10:46
  • You could store a `std::function` in the class and point it to the right version of `foo`. Scrap the inheritance all together. – super Dec 09 '19 at 10:46
  • @FireLancer That's my third question. I use different derived class to implement different access and print functions. – tropurchan Dec 09 '19 at 10:47
  • In this case I'd simply use a single class non-polymorphic and make `if/else` statement inside the `foo()` function to distinguish the two cases. – ALX23z Dec 09 '19 at 10:49
  • the only risk I see here is that the code does not compile. When asking for a code review you should show complete code. Afaik this also wouldnt be well received on code review – 463035818_is_not_an_ai Dec 09 '19 at 10:49
  • @SamVarshavchik Thanks for pointing out the mistakes.1) DefaultNode is also derived from the Base. 2) Should be _nodeList[i] = Derived1(); I'll fix that right away. – tropurchan Dec 09 '19 at 10:52
  • Are you sure your code "works" ? You haven't shown representative code, but from what you've shown and explained, I don't see how it can. You can't use polymorphism when you store the objects ([sliced](https://stackoverflow.com/questions/274626/what-is-object-slicing)) in a container of the base type. – Sander De Dycker Dec 09 '19 at 10:52
  • No, ` _nodeList[i]=Derived1();` will not work either. C++ does not work this way. This kind of micro-optimization is mostly pointless. On modern, multi-Ghz CPUs, a few hundred ordinary `new`s is mostly line noise. – Sam Varshavchik Dec 09 '19 at 10:55
  • @ALX23z I'm not sure if I've mistaken something, but I thought that I've successfully added objs of different derived class in a newed array and it's working as what I've expected(?) – tropurchan Dec 09 '19 at 10:56
  • Oops. I think I found the problem. I'm not doing it the way I claimed I did. I use placement new instead. So sorry for the mistake... – tropurchan Dec 09 '19 at 10:59
  • One question per question please. This is really broad. – Lightness Races in Orbit Dec 09 '19 at 11:10
  • @SanderDeDycker I've edited my post with a testable program with placement new and polymorphism. It would be great if you could tell me where I've gone wrong. Thanks a lot! – tropurchan Dec 09 '19 at 11:28

2 Answers2

1
  1. Are there any risks to store objects of different class in an newed array, given that they are all of the same size and can be destructed using a virtual destructor?

This is not what happens in your second proposition:

Base* _nodeList = new DefaultNode[max_num];

_nodeList is an array of DefaultNote and nothing else! Trying to store something in it like _nodeList[i] = ... will never change anything about the nature of stored objects (note that _nodeList[i] = Derived1; is not C++). If you want polymorphism you need to retain objects either through pointers or references. Then the first solution is the correct one: std::vector<Base*> _nodeList;.

  1. I've always heard that the use of new[] should be avoided. I did found some approaches that achieve what I want using vector of union with a type tag, but it seems somewhat dirty to me. Is there a way to achieve polymorphism while storing data in a std::vector?

the use of new[] should be avoided is a non-sense. As said before, if you need polymorphism then std::vector<Base*> _nodeList; is perfect, because that means that you can store in _nodeList the address of any object whose class is either Base or any subtype of.

  1. Is the practice of using polymorphism merely to make use of the convenience of virtual functions consider a bad habit? By saying so I mean if the memory taken by each object is already the same for each derived class, then they may be merged into one single class that store its own type, and each member function can just behave according to its own type. I chose not to do so since it also looks dirty to me to have huge switch structure in each member function.

Subtyped polymorphism is the use of virtual functions. Why bad habit? If you don't use virtual functions that just means that you are constructing the polymorphism by yourself, which is probably a very bad thing.

Now, if your derived classes are just like what was proposed in your exemple, I can suggested you not to use subclasses but only ctor overloading...

  1. Is it good to choose contiguous memory in this case? Are there any reasons that such choice may be harmful?

I'm not sure to really understand why this is a concern for you. Contiguous memory is not harmful... This question is at least not clear.

Jean-Baptiste Yunès
  • 34,548
  • 4
  • 48
  • 69
  • Well, I suppose the 4th one is that usually trying to get contiguous memory (and/or anything other desired memory layout characteristic) for the entire data structure is more complex, (your `std::vector` or any smart pointer alone would not achieve that, you would need to allocate those elements, using a special allocator, a pre-created `T[n]` for each derived, or something). A common reason to do it is optimization due to how the CPU cache and memory I/O works. – Fire Lancer Dec 09 '19 at 10:56
  • @FireLancer Yes I agree on this of course, but it seems to me that OP is very confused about too many things. – Jean-Baptiste Yunès Dec 09 '19 at 11:02
  • Thanks for the answer! For (1), I've updated an edited version of how I manage to use placement new to store objects of different classes in an array. Am I doing something wrong or dangerous? For (2), I am aware that it's commonplace to use `vector` to achieve polymorphism, but what I'm actually asking is whether it is possible to store the actual object in contiguous memory. As for (3), constructor overloading doesn't seem to give me different member function with the same interface to operate with(?). Again, thanks a lot for the answer! – tropurchan Dec 09 '19 at 13:20
  • I suggest you to write another question, this one is overloaded. Quick: you can not use placement new that way, even if it seems to work, that code is wrong. – Jean-Baptiste Yunès Dec 09 '19 at 13:44
  • Okay, I'll reorganize it and write another question. Thanks for the suggestions. – tropurchan Dec 09 '19 at 15:07
0

The problem is that normally you cannot allocate different polymorphic classes inside a vector or array - only pointers to them. So you cannot make it contiguous.

In your case usage of polymorphism is most probably a bad idea. It will result in poor memory fragmentation and slow performance due to issues with lots of virtual calls and fails on the branch prediction. Though, if there aren't many nodes or you don't use it too frequently in your code - then it won't affect the overall performance of the program.

To avoid this, simply store nodes' data (and make it a plain struct) inside a vector and utilize separate classes that implement those foo() functions.

Example:

std::vector<NodeData> nodes;
class Method1
{
   public:
   static void Process(NodeData& node);
   ...
}

class Method2
{
   public:
   static void Process(NodeData& node);
   ...
}

Then you can either make a single switch to choose which method to apply or, say, store nodes' data inside several vectors so that each vector identifies which method to use.

ALX23z
  • 4,456
  • 1
  • 11
  • 18
  • This seems quite suitable for my case and I'll try it. However, I'm not so clear on the part that _"uses of polymorphism results in poor memory fragmentation."_ Could you please explain that or show me where / with what keywords I can get to learn about that? Thanks a lot for the help! – tropurchan Dec 09 '19 at 13:36
  • @tropurchan it is not that polymorphism actually does it. The issue is that you'll have to store the data in format `std::vector` or `std::vector>` which implies that each member is stored separately - applying `new` a million times to small objects is a bad idea. It can be resolved by allocating them in chunks of types... but then it is very messy. – ALX23z Dec 09 '19 at 14:01
  • Thanks for the reply. That's exactly what I originally did and what you've mentioned is one of the reasons I chose to implement it differently. So, is what I updated in the EDIT part invalid? Cause if I understand it right, I somehow did succeed in allocating objects of different polymorphic classes in an array...(?) – tropurchan Dec 09 '19 at 14:08
  • @tropurchan polymorphism is usually implemened via a `vtable` so if derived class has no extra data and you simply copy it into a base class then the `vtable` pointer will also be copied - which will result in calling the functions from the derived class whenever you call a virtual function. However, as far as I understand this is a UB and standart doesn't guarantee for it to work and it might result in odd stuff in various places. – ALX23z Dec 09 '19 at 14:24
  • So even placement new can't guarantee it behaves as what I expected? I thought it's something different from copying it into a Base object. Did I misunderstand the use of placement new? – tropurchan Dec 09 '19 at 14:30
  • @tropurchan I believe so. Also, in recent talk on CppCon it was mentioned that `new(p) myClass` might do strange things - like using nearby memory to store information regarding allocation. The guy was very frustrated over this fact. Basically, `placement new` is kinda broken. – ALX23z Dec 09 '19 at 14:51
  • Cool! Thanks so much for all the information. I think that's all I want to ask. Sorry to have asked so messily. I'll try to make it clear next time. – tropurchan Dec 09 '19 at 15:04