3

I am implementing a system that stores and manipulates a lot of repetitive short strings. For example stock price series. I will have a lot of repetitive entries of Microsoft stock prices:

<time1>,MSFT,60.01
<time2>,MSFT,60.02
<time3>,MSFT,60.00

I am thinking of using Boost::Flyweight to optimize the memory allocation, string lookup/comparison/copying cost of those small repetitive ticker names (like MSFT in this case).

But the thing is those strings are pretty small to begin with -- usually just a few bytes. While a long type is 8 bytes already in modern computers. Is it worth it to use Boost::Flyweight in this case?

My understanding of Boost::Flyweight is that it internalized strings are integers to improve performance. But I think lookup/comparison/copying a 8-byte string wouldn't be dramatically different from operating on a 8-byte long datatype. So is it worth the hassel of moving to Boost::Flyweight ?

My main goal is more on the speed optimization side as opposed to memory optimization side, if I have to choose one.

CuriousMind
  • 15,168
  • 20
  • 82
  • 120
  • A few thoughts come to mind: 1) I'm not familiar with Boost's implementation, but I feel like the flyweight pattern is primarily for memory savings, not speed. 2) Don't forget about cache locality. A flyweight is almost certainly going to be in a different part of memory than everything else you are working with "locally", which means cache misses. 3) If your compiler is new enough, you should get small string optimization, which allocates strings on the stack if they are short enough. That may make a bigger difference than flyweight. But I think you should run some tests to be sure. – 0x5453 Oct 27 '16 at 13:35
  • I agree with @0x5453 that in this particular case storing the strings as NUL-terminated char arrays (e.g. `std::array`) could be the best for performance. Of course it depends on the scale of the allocations – sehe Oct 27 '16 at 13:38
  • @0x5453 thanks for your reply. I feel like this should be an answer not a comment. :) – CuriousMind Oct 27 '16 at 15:28
  • 1
    Are you wanting the object with id "MSFT" to have any functions, or just store a four character code? In the flyweight pattern, the object for a specific code is assumed to have more to it, like you can look up various things about the company from its flyweight object. – Kenny Ostrom Oct 27 '16 at 15:42
  • Also, how many distinct four character codes do you need to track? If for example, you know you will have less than 65k then you can cut the memory storage in half after reserving enough space for a lookup table. Since there is a small flyweight pool, you only have to be able to find the object. This doesn't always require a 64 bit pointer to an object. You could just have a short int as the lookup index. – Kenny Ostrom Oct 27 '16 at 15:43
  • @KennyOstrom reminds me of a specific case I did for stock quotes: https://stackoverflow.com/questions/16141178/is-it-possible-to-map-string-to-int-faster-than-using-hashmap/16141214#16141214 – sehe Oct 28 '16 at 10:35

1 Answers1

4

Flyweight is very generic and configurable.

I'd suggest using a backing of strings allocated from a single, fixed-size pool of memory (e.g. std::vector<CharType>). You would then only need to return std::string_views to range of bytes in the backing storage.

You can use FlyWeight to configure things like so, but I'd need to find time to demo it.

Alternatively, you could "roll your own". I have some samples of that on StackOverflow:

My experience with Flyweight has varied (https://stackoverflow.com/search?tab=votes&q=user%3a85371%20flyweight, e.g. boost multi_index_container and slow operator++). It seems that naive implementation of Flyweight is rarely what you want.

UPDATE Just remembered this related demo I made using Perfect Hashing for NASDAQ ticker symbols: Is it possible to map string to int faster than using hashmap?

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633