Most suitable data structure to support delete from left and delete from arbitrary index where delete from left is twice as much frequent

Question

I am writing a function that takes a large list of strings and classifies them.

Each string classified is deleted from the list one after another.

Sometimes the function needs to delete a string in an arbitrary index and sometimes it needs to delete a string from the leftmost position.

The ratio between these two cases is about 2/3 of the times it deletes from the leftmost position and 1/3 of the time from an arbitrary place in the middle.

I know using list as the data structure to hold the strings is O(n) deleting the leftmost cell and O(n-i) deleting the i'th cell, since it moves all the cells to the right one step left for each deletion.

I then tried using collections.deque instead of list, but it actually took more time (about 4/3 more time) than the original list data structure, possibly because of the arbitrary place deletions.

What data structure will perform best under these assumptions? Is there any better than just using a list?

The deletions with the list data structure from leftmost cell with is using list.pop(0) and removing from arbitrary index is by value using list.remove(value).

How large is the list, and how do you find those arbitrary indexes? — no comment, Sep 01 '21 at 16:54
Just using `leftpop()` for deleting from left and `remove(index)` for deleting in arbitrary index — user183748292, Sep 01 '21 at 16:55
Perhaps reverse your list (and adjust the algorithm accordingly) so you delete from the right instead? — no comment, Sep 01 '21 at 16:55
Or perhaps do lazy deletions, i.e., instead of deleting at arbitrary indexes, replace the value with `None` and delete it when you come across it later (when it's the last list element). We could help you better if you told us more about your overall procedure. — no comment, Sep 01 '21 at 16:59
@don'ttalkjustcode ~ one million strings, the procedure can sometimes classify a string and then deduce the classification of other related strings, hence deleting them by value from the list — user183748292, Sep 01 '21 at 17:04
So when you delete them "by value", how do you know where they are? — no comment, Sep 01 '21 at 17:06
@don'ttalkjustcode These are the arbitrary position deletions — user183748292, Sep 01 '21 at 17:10
That doesn't answer the question. My point is that slowness might rather be a result of how you find the deletion indexes instead of the actual deletions. — no comment, Sep 01 '21 at 17:11
Yes, the first comment was misleading. I edited the post and added this description. — user183748292, Sep 01 '21 at 17:24
I don't have time now and would like to know more to proceed anyway, but have a look at this [demo](https://tio.run/##pY7LCsIwEEX3@YrZSNJaSlUUKfRLxEVqkzaQF9NR8OtjqF240oXLe@dc5sQnTcEfzhFT0hgckHHKEBgXA9KaGJPQwYXrEPgVStg1ZXmCbW56iWvTMMrMmxdWun6QLcgalQsPJRawqMDfXa@wy3ixrFhE40nwBTN@BJl/TgrQjBO1vAK@OdZ7zWEDlIMz1ppZ3YIfZl78oRVDFM0Xn3yPHzpWaWoBfvik9AI). — no comment, Sep 01 '21 at 17:27
```Deque``` is suitable for this case. I don't understand why it's slower in your case. — Ram, Sep 01 '21 at 17:28

Sai Prashanth · Answer 1 · 2021-09-01T17:10:46.650

1

If you know where you exactly want to delete, you might be looking for a linked list implementation

Do note that deletion from a linked list is O(1)

However, searching for an element is O(n)

EDIT: If you instead care about values to be deleted, a set is probably what you're looking for. Do note that set is an unordered datastructure and would only allow for values to be deleted

edited Sep 01 '21 at 17:10

answered Sep 01 '21 at 16:59

Sai Prashanth

144
2
11

1

Depends on how they find the deletion index. If they actually don't come from an index but from a *value* to delete, they could use a dict mapping the values to linked list nodes. Then "searching" should be O(1) as well. – no comment Sep 01 '21 at 17:01
but I believe that using a dict is O(logn(n)) tho – Sai Prashanth Sep 01 '21 at 17:02
1

No, it's O(1), except for very special cases (like [this](https://stackoverflow.com/q/69017358/16759116) but for dict). Especially for strings, as there's randomization involved so it's hard to make it slow even if you try. – no comment Sep 01 '21 at 17:04

score 1 · Accepted Answer · answered Sep 01 '21 at 20:47

Try an OrderedDict with your strings as keys.

Benchmark with just 30,000 strings, 20,000 deletions from left and 10,000 "arbitrary" removals:

1476 ms  1490 ms  1491 ms  using_list
  10 ms    11 ms    11 ms  using_OrderedDict

Instead of

strings_list.pop(0)
strings_list.remove(value)

use

strings_dict.popitem(False)
strings_dict.pop(value)

This assumes you have no duplicate strings, which seems likely. If you do, then use the dictionary values as the frequencies.

Benchmark code (Try it online!):

from timeit import timeit
from collections import OrderedDict
from random import randrange

def using_list(strings, removes):
    for string in removes:
        strings.pop(0)
        strings.pop(0)
        strings.remove(string)

def using_OrderedDict(strings, removes):
    strings = OrderedDict.fromkeys(strings)
    for string in removes:
        strings.popitem(False)
        strings.popitem(False)
        strings.pop(string)

# Build testcase with 30,000 elements
strings = list(map(str, range(10_000, 40_000)))
copy = strings.copy()
removes = []
for _ in range(10_000):
    copy.pop(0)
    copy.pop(0)
    i = randrange(len(copy))
    removes.append(copy.pop(i))

# Benchmark
for _ in range(3):
    for func in using_list, using_OrderedDict:
        times = []
        for _ in range(3):
            copy = strings.copy()
            t = timeit(lambda: func(copy, removes), number=1)
            times.append(t)
        times.sort()
        print(*('%4d ms ' % (t * 1e3) for t in times), func.__name__)
    print()

I'm amazed this worked so fast, days became just minutes – user183748292 Sep 01 '21 at 21:49 — user183748292, Sep 01 '21 at 21:49

Most suitable data structure to support delete from left and delete from arbitrary index where delete from left is twice as much frequent

2 Answers2

Linked