19

I need to build a data-structure like this:

map[string]SomeType 

But it must store values for about 10 minutes and then clear it from memory. Second condition is records amount - it must be huge. This data-structure must add at least 2-5K records per second.

So, what is the most correct way in Go to make it?

I'm trying to make goroutine with timeout for each new elemnt. And one(or more) garbage-collector goroutine with channel to receive timeouts and clear elements. But I'm not sure it's the most clear way. Is it Ok to have millions of waiting goroutines with timeouts?

Thanks.

matvey.co
  • 381
  • 1
  • 4
  • 9
  • 2
    If you need this sort of TTL behavior, you aren't going to want to use `map`, or at least you're going to need to add some other data structures in addition to `map`. You may want to take a look at min heaps (for example, in Go, there's [container/heap](http://golang.org/pkg/container/heap/)). – joshlf Aug 25 '14 at 11:08
  • Thanks for that! But using of map is necessary because I need random access to elements by their ID. Any ideas? – matvey.co Aug 25 '14 at 12:18
  • 3
    So in sketch, what I'd do is this (if you want, I can write this up in more detail as an answer): Have a data structure that has both a map (for values) and a heap (for expiration times). Each time you add an entry to the map, it gets inserted into the heap keyed on its expiration time. Each time somebody does a map access operation of some kind, you first pop all of the elements off the heap which have expired (basically, keep going until the expiration times are after `time.Now()`), and remove all of them from the map. Then you perform whatever operation was requested. – joshlf Aug 25 '14 at 13:05
  • 2
    Alternatively, if you'd like the map items to expire even if nobody's using the map (which could be useful to reclaim memory used by other parts of the program), you could run a separate goroutine that just loops popping things off the heap, sleeping until their expiration time comes, and removing the element from the map (you'd need to make sure both the heap and the map were properly synchronized - probably using a `sync.RWMutex` or something similar; this is one of the cases where using channels for synchronization would be overkill). – joshlf Aug 25 '14 at 13:07
  • Have 10 maps and look for elements in all of them. Every minute throw the oldest map away and make a new one. – Nick Craig-Wood Aug 25 '14 at 13:57
  • If your needs start to outgrow a simple in-process data structure, this is something redis is really good at. – JimB Aug 25 '14 at 14:42
  • @JimB I'm already using Redis for this. That's why I'm searching for alternative. Redis is not good with cleaning old keys when insertion rate is high. – matvey.co Aug 25 '14 at 15:34
  • @matvey.co, thanks, good to know. It does seem a little strange that only 2-5k RPS causes problems; I may have to test that out. – JimB Aug 25 '14 at 15:43
  • @Nick that'd just a garbage collection nightmare, all the wasted memory! – OneOfOne Aug 25 '14 at 19:47
  • 1
    You might want to consider using https://github.com/ReneKroon/ttlcache – blasrodri Jul 05 '21 at 12:28

3 Answers3

41

You will have to create a struct to hold your map and provide custom get/put/delete funcs to access it.

Note that 2-5k accesses per second is not really that much at all, so you don't have to worry about that.

Here's a simple implementation:

type item struct {
    value      string
    lastAccess int64
}

type TTLMap struct {
    m map[string]*item
    l sync.Mutex
}

func New(ln int, maxTTL int) (m *TTLMap) {
    m = &TTLMap{m: make(map[string]*item, ln)}
    go func() {
        for now := range time.Tick(time.Second) {
            m.l.Lock()
            for k, v := range m.m {
                if now.Unix() - v.lastAccess > int64(maxTTL) {
                    delete(m.m, k)
                }
            }
            m.l.Unlock()
        }
    }()
    return
}

func (m *TTLMap) Len() int {
    return len(m.m)
}

func (m *TTLMap) Put(k, v string) {
    m.l.Lock()
    it, ok := m.m[k]
    if !ok {
        it = &item{value: v}
        m.m[k] = it
    }
    it.lastAccess = time.Now().Unix()
    m.l.Unlock()
}

func (m *TTLMap) Get(k string) (v string) {
    m.l.Lock()
    if it, ok := m.m[k]; ok {
        v = it.value
        it.lastAccess = time.Now().Unix()
    }
    m.l.Unlock()
    return

}

playground

note(2020-09-23): for some reason the time resolution on the current version of the playground is way off, this works fine, however to try on the playground you have to change the sleep to 3-5 seconds.

OneOfOne
  • 95,033
  • 20
  • 184
  • 185
  • 1
    That looks like I was searching for! Thanks. – matvey.co Aug 25 '14 at 15:35
  • 1
    Thank you for the code. I was wondering about this exact problem a few days ago and I thought a similar solution. Scanning a whole map every second possibly is too much, but I suppose it depends on the application. – siritinga Aug 27 '14 at 15:45
  • 2
    Another solution would be to keep a map and a linked list. The linked list contains pairs of (key, timestamp). You push on the back and get from the front, so keys are time-sorted and it is very fast to know which keys can be removed. Could be even instantaneous if you set the right Sleep duration. This uses more memory but it would be faster. – siritinga Aug 27 '14 at 15:49
  • @matvey.co if the answer helped you, you could mark it as the correct answer by hitting the little check mark under the upvote/downvote buttons. siritinga that highly depends on how many items on the map and how many get/set per second. – OneOfOne Aug 27 '14 at 15:50
  • @siritinga that would work with a slice too with a pointer to the item (and add the key as a field to the item struct), otherwise you can't update the lastAccess value on get. – OneOfOne Aug 27 '14 at 15:51
  • 1
    @OneOfOne you are right. I understood that he needs the data for 10 minutes since the insertion, not from the last access. I suppose it could be interpreted both ways. If the max number of elements is known, then of course a circular buffer using a slice would be better than a a list, or a heap as suggested previously. – siritinga Aug 27 '14 at 15:55
  • Good idea about the circular buffer, I didn't think of that! – OneOfOne Aug 27 '14 at 18:22
5

Take a look at buntdb.

tinykv is no longer being maintained.

Just for the record, I had the same problem and wrote tinykv package which uses a map internally.

  • It uses a heap of time.Time for timeouts, so it does not ranges over the whole map.
  • A max interval can be set when creating an instance. But actual intervals for checking the timeout can be any value of time.Duration greater than zero and less than max, based on the last item that timed out.
  • It provides CAS and Take functionality.
  • A callback (optional) can be set which notifies which key and value got timed out.
  • Timeouts can be explicit or sliding.
Kaveh Shahbazian
  • 13,088
  • 13
  • 80
  • 139
1

I suggest to use Map of golang's built-in package sync, it's very easy to use and already handles concurrency https://golang.org/pkg/sync/#Map

Hieu Vo
  • 3,105
  • 30
  • 31