Most optimal way to remove an index from an vector-like data structure without copying data in C++

Question

The problem:

I need to create a simple vector or vector-like data structure that has indexable access, for example:

arr[0] = 'a';
arr[1] = 'b';
//...
arr[25] = 'z';

From this structure, I would like to remove some index, for example index [5] The actual value at the index does not need to be erased from memory, and the values should not be copied anywhere, I just need the indexes of the data structure to re-arrange afterward, so that:

arr[0] = 'a';
//...
arr[4] = 'e';
arr[5] = 'g';
//...
arr[24] = 'z';

Is std::vector the best data structure to use in this case, and how should I properly remove the index without copying data? Please provide code.

Or is there a more optimal data structure that I can use for this?

Note, I am not intending on accessing the data in any other way except through the index, and I do not need it to be contiguously stored in memory at any time.

Use [`std::map`](https://en.cppreference.com/w/cpp/container/map). vectors/arrays are contiguous and can have no missing index positions. `std::map` is indexable, iterable and can have indexes (and their values) erased. — Richard Critten, Jan 23 '23 at 21:29
Not much you can do other than have a parallel array of booleans or a mask that flags each index as in use or not. As you can guess, over time the extra checking of the in-use flags costs more than the value-moving would have. — user4581301, Jan 23 '23 at 21:29
If you do inserts/deletes a lot, you could use a `std::list`, but then you wouldn't be able to use indexing. You'd have to rely on iterators. — Ted Lyngmo, Jan 23 '23 at 21:31
What is your actual use case? Are you going to modifying this container a lot or just once? How big is the container going to be? How often are you going to access a random element? All of that will help determine the right container for you. — NathanOliver, Jan 23 '23 at 21:33
Why should they not be copied? If ist is for performance resons, copying about 20 bytes is probably *much* more performant than any other way of keeping track of what members are left. — BoP, Jan 23 '23 at 21:38
@RedBox A linked list is almost never favorable. In this case, to remove an element at a specific index, you probably need to iterate over all the elements until you reach it. — François Andrieux, Jan 23 '23 at 21:45
Can you please share your motivation for not accepting any copies? It looks like you have a vector of 26 ints. Copying aven 25 ints when you remove the first element will be incredibly fast, and any alternative data structure will inevitably introduce new drawbacks. — François Andrieux, Jan 23 '23 at 21:51
Have an array of _pointers to the actual things you don't want to copy_. Don't worry about how to remove an element from a vector -- just use erase. See [How do I erase an element from std::vector<> by index?](https://stackoverflow.com/questions/875103/how-do-i-erase-an-element-from-stdvector-by-index) — Wyck, Jan 23 '23 at 22:00
My use case can have thousands of entries, but this is for an algorithm that I want to be as efficient as possible. Can you provide an example with std::map in this case? — Mich, Jan 23 '23 at 22:35
map doen't solve this, since it doesn't re-arrange the indecies — Mooing Duck, Jan 23 '23 at 23:19
At best a `map` looks like it will solve the problem, but it will drop the performance of regular loop-ups significantly. You'd have to do a lot more removal than iteration or look up to break even. Yeah, `vector` is O(n) with all the element-shifting on deletion, but that O(n) is in a straight line, and a modern computer absolutely kills highly-predictable algorithms. A periodic O(n) is a great trade off to keep everything else the `vector` does at O(1). Meanwhile, pretty much everything you do with the `map` will be O(log(n)) with a bunch of pointer-chasing... — user4581301, Jan 24 '23 at 00:04
...and I can't see a `unordered_map` not degrading into a series of linked lists unless `arr` is very sparse. — user4581301, Jan 24 '23 at 00:05
@Mich: "*this is for an algorithm that I want to be as efficient as possible*" That's complicated. The performance of the implementation of that algorithm is going to depend on a number of factors. While erasing an element of a `map` is (somewhat) cheap, *iterating* through a `map` is not. Indeed, *indexing* a `map` is not cheap either. The cost of those operations may matter more for your algorithm, such that paying the cost of the copy a small price to pay. You need to profile this stuff. — Nicol Bolas, Jan 24 '23 at 22:19
You could use `std::vector>` and skip the `std::nullopt` entries (those would indicate a "hole"), if erasing from a `std::vector` is too expensive. — Eljay, Jan 24 '23 at 23:06
Before answering: Is the actual data also a char or do you want good performance also for more complex data-structures? — Hans Olsson, Jan 25 '23 at 15:10

Something Something · Answer 1 · 2023-01-23T22:38:02.023

0

What you want is probably covered in one of these:

what has been proposed for std::hive

Hive is a formalisation, extension and optimization of what is typically known as a 'bucket array' container in game programming circles; similar structures exist in various incarnations across the high-performance computing, high performance trading, 3D simulation, physics simulation, robotics, server/client application and particle simulation fields.

std::flat_map in C++23

A flat_map is a kind of associative container that supports unique keys (contains at most one of each key value) and provides for fast retrieval of values of another type T based on the keys.

edited Jan 23 '23 at 22:38

answered Jan 23 '23 at 22:32

Something Something

3,999
1
6
21

these do not exist yet in c++? – Mich Jan 23 '23 at 22:33
1

@Mich define "exist in C++" :) They likely exist in some shape or form on Github etc but not in the C++ standard. std::flat_map is approved for C++23 but there is no implementation yet on gcc/clang. There is a [boost flat_map](https://www.boost.org/doc/libs/1_81_0/doc/html/boost/container/flat_map.html). Not sure what's the status of std::hive at this moment – Something Something Jan 23 '23 at 22:43
1

You have made a simple question rather more complicated than I think it warrants. There is no universally optimal data structure or container. We don't know from the question what the other access usages are, but "vector-like" suggests a _sequential container_ perhaps? No need to introduce newer ever more esoteric or specialist containers to a simple "classic" problem. – Clifford Jan 23 '23 at 23:05
neither hive nor flat_map appear to update the indecies, so this is also **wrong** – Mooing Duck Jan 24 '23 at 22:09

score 0 · Answer 2 · answered Jan 24 '23 at 22:13

Since you want the indecies to be updated, then you need a sequential container: vector, list, deque, or similar. vector and deque will copy values around, but list is also slow for virtually all purposes, so none of these is a great fit at first.

Ergo, the best solution is std::vector<std::unique_ptr<Item>>. Then you get very fast accesses, but when removing elements by index, the actual items themselves are not moved, only pointers are rearranged.

score 0 · Answer 3 · answered Jan 28 '23 at 10:59

Another range-based solution consists of:

enumerate each element of v,
remove(each element)_if it is_contained_in those toBeRemoved based on the key,
and finally transform the result retaining only the value.

auto w = enumerate(v)
       | remove_if(is_contained_in(toBeRemoved), key)
       | transform(val);

Full example on Compilier Explorer.

I've also used BOOST_HOF_LIFT from Boost.HOF to turn std::get into an object that I can pass around, based on which I've defined key and val.

Clifford · Answer 4 · 2023-01-23T23:19:17.403

It is a requirement of the vector data structure that it's data is contiguous in memory, so it is not possible to remove an element without moving memory to fill the gap (other than the final element).

A vector is one of the sequence containers. A sequence container with minimal O(1) cost of element removal is a double-linked list (as implemented by std::list). A list can be efficiently accessed sequentially, but unlike a vector is O(n) for random access.

For a discussion of the time complexity of different operations on various container classes see for example https://dev.to/pratikparvati/c-stl-containers-choose-your-containers-wisely-4lc4

Each container has different performance characteristics for different operations. You need to choose one that best fits the operations you will mostly perform. If sequential access and element insertion and removal are key, a list is appropriate. If random access is more critical, it may be worth the hit of using a vector if insertion/removal are infrequent. It may be that neither is optimal in your application, but for the specific situation detailed in your question, a linked-list fits the bill.

Enlico · Answer 5 · 2023-01-25T07:19:24.093

What about using a view-base approach?

Here I show a solution using ranges::any_view. At the bottom of this answer is the complete working code, which shows that the vector-like entity we make up is actually pointing to the very elements of the original std::vector.

Beware, I'm not addressing performance in any way. I don't claim to know much of it, in general, and I know even less of it as related to the cost of the abstractions I'm using below.

The core of the solution is this function for dropping only one element, the one with index i, form the input range:

constexpr auto shoot =
    [](std::size_t i, auto&& range)
        -> any_view<char const&, category::random_access> {
    return concat(range | take(i), range | drop(i + 1));
};

In detail,

given the index i of the item to be removed from the input range,
it creates a range by takeing the first i elements from range (these are the elements before the element of index i),
it creates a range by droping the first i + 1 elements from range (thus retaining the elements after the element of index i),
and finally it concatenates those two ranges
returning the resulting range as an any_view<char const&, category::random_access>, to avoid nesting more and more views for each repeated application of shoot;
category::random_access is what allows a []-based access to the elements.

Given the above, deleting a few elements from the range is as easy as this:

auto w = shoot(3, shoot(9, shoot(10, v)));

However, if you were to call shoot(9, shoot(3, v)) you would be removing the 3rd element first, and then the 9th element of the resulting range, which means that you'd have removed the 3rd and 10th elements with respect to the original vector; this has nothing to do with the range-base approach, but just with providing a function to delete only one element.

Clearly you can build on top of that a function that eliminates all the indices from another range:

sort the indices of elements to be removed,
for-loop on them in reverse (for the reason explained above),
and shoot them one by one (without using any_view we couldn't do view = shoot(n, view); because each application of shoot would change the type of the view):

constexpr auto shoot_many = [](auto&& indices, auto&& range){
    any_view<char const&, category::random_access> view{range};
    sort(indices);
    for (auto const& idx : reverse(indices)) {
        view = shoot(idx, view);
    }
    return view;
};

I have tried another solution for shoot_many, where I would basically index all the elements of the range, filter out those with index contained in indices, and finally transforming to remove the indices. Here's sketch of it:

constexpr auto shoot_many = [](auto&& indices, auto&& range){
    std::set<std::size_t> indices_(indices.begin(), indices.end()); // for easier lookup
                                                                    // (assuming they're not sorted)
    auto indexedRange = zip(range, iota(0)); // I pretty sure there's a view doing this already
    using RandomAccessViewOfChars
        = any_view<char const&, category::random_access>;
    return RandomAccessViewOfChars{
        indexedRange | filter([indices_](auto&& pair){ return indices_.contains(pair.second); })
                     | transform([](auto&& pair){ return pair.first; })};
};

This, however, doesn't work because having filter in the pipe means that we don't know the length of the resulting range until the moment we truly traverse it, which in turn means that the output I'm returning doesn't meet the compile-time requirements for a category::random_access any_view. Sad.

Anyway, here's the solution:

#include <assert.h>
#include <cstddef>
#include <functional>
#include <iostream>
#include <memory>
#include <range/v3/algorithm/sort.hpp>
#include <range/v3/range/conversion.hpp>
#include <range/v3/view/any_view.hpp>
#include <range/v3/view/concat.hpp>
#include <range/v3/view/drop.hpp>
#include <range/v3/view/iota.hpp>
#include <range/v3/view/reverse.hpp>
#include <range/v3/view/take.hpp>
#include <set>
#include <vector>

using namespace ranges;
using namespace ranges::views;

// utility to drop one element from a range and give you back a view
constexpr auto shoot =
    [](std::size_t i, auto&& range)
        -> any_view<char const&, category::random_access> {
    return concat(range | take(i), range | drop(i + 1));
};

constexpr auto shoot_many = [](auto&& indices, auto&& range){
    any_view<char const&, category::random_access> view{range};
    sort(indices);
    for (auto const& idx : reverse(indices)) {
        view = shoot(idx, view);
    }
    return view;
};


int main() {
    // this is the input
    std::vector<char> v = iota('a') | take(26) | to_vector;
    // alternavively,   = {'a', 'b', ...)

    // remove a few elements by index
    auto w = shoot_many(std::vector<int>{3, 10, 9}, v);

    for (std::size_t i = 0, j = 0; i != v.size(); ++i, ++j) {
        if (i == 10 || i == 9 || i == 3) {
            --j;
            std::cout << v[i] << ',' << '-' << std::endl;
        } else {
            std::cout << v[i] << ',' << w[j] << std::endl;

            assert( v[i] ==  w[j]);
            assert(&v[i] == &w[j]);
        }
    }
}

Most optimal way to remove an index from an vector-like data structure without copying data in C++

5 Answers5