4

I need to efficiently implement a fixed size FIFO in python or numpy. And I might have different such FIFOs, some for integers, some for strings, etc. In this FIFO, I will need to access each element by its index.

Concern for efficiency is because these FIFOs will be used at the core of a program that is expected to run for several days continuously, and large volume of data is expected to pass through them. So not only will the algorithm need to be time efficient, it also has to be memory efficient.

Now in other languages like C or Java, I would implement this efficiently using a circular buffer, and string pointers (for the string FIFOs). Is that the an efficient approach in python/numpy, or is there a better solution?

Specifically, which of these solutions is the most efficient:

(1) dequeue with maxlen value set: (what will be the effect of garbage collection on the dequeue's efficiency?)

import collections
l = collections.deque(maxlen=3)
l.append('apple'); l.append('banana'); l.append('carrot'); l.append('kiwi')
print(l, len(l), l[0], l[2])
> deque(['banana', 'carrot', 'kiwi'], maxlen=3) 3 banana kiwi

(2) list subclass solution (taken from Python, forcing a list to a fixed size):

class L(list):
    def append(self, item):
        list.append(self, item)
        if len(self) > 3: self[:1]=[]
l2.append('apple'); l2.append('banana'); l2.append('carrot'); l2.append('kiwi')
print(l2, len(l2), l2[2], l2[0])
> ['banana', 'carrot', 'kiwi'] 3 kiwi banana

(3) a normal numpy array. But this limits the size of the strings, so how to specify the max string size for this?

a = np.array(['apples', 'foobar', 'cowboy'])
a[2] = 'bananadgege'
print(a)
> ['apples' 'foobar' 'banana']
# now add logic for manipulating circular buffer indices

(4) an object version of above, but python numpy array of arbitrary length strings says that using an object undoes the benefits of numpy

a = np.array(['apples', 'foobar', 'cowboy'], dtype=object)
a[2] = 'bananadgege'
print(a)
> ['apples' 'foobar' 'bananadgege']
# now add logic for manipulating circular buffer indices

(5) or is there a more efficient solution than the ones presented above?

BTW, my strings have a max upper bound in their length, if that is of any help..

R71
  • 4,283
  • 7
  • 32
  • 60

1 Answers1

2

I'd use NumPy. To specify the maximum string length, use a dtype like so:

np.zeros(128, (str, 32)) # 128 strings of up to 32 characters
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • Thanks @John. My intuition is also to use numpy. And thanks for telling how to set maxsize for the strings. But I will wait for sometime to get a more detailed answer why the numpy option is better than the others. – R71 Jul 30 '18 at 08:06
  • @R71: You're welcome. What you're asking for now is rather off-topic, as it is basically an opinion poll ("which of these four workable options should I use?"). You may as well implement two or three and see how it goes. – John Zwinck Jul 30 '18 at 11:11