4

I am working on a project in Swift, using a dictionary.

This dictionary is of the type [String : [Posting]]. I have around 200k different "terms" (keys) to insert in it, and for each term I have around 500 to 1000 objects to append in a list. I know it's a weird practice but I don't have the choice and I must deal with all those elements.

The issue is that this is very very slow, as the dictionary gets bigger. I tried switching to a NSMutableDictionary, no luck.

My addTerm function is called everytime I need to insert an element :

   func addTerm(_ term: String, withId id: Int, atPosition position: Int) {

        if self.map[term] == nil {
            self.map[term] = [Posting]()
        }

        if self.map[term]!.last?.documentId == id {
            self.map[term]!.last?.addPosition(position)
        }
        else {
            self.map[term]!.append(Posting(withId: id, atPosition: position, forTerm: term))
        }
    }

EDIT: I realized that its not the dictionary that causes all this lag, but its actually the arrays it contains. Arrays re-allocate way too much when adding new elements, and the best I could was to replace them with ContiguousArray.

Scaraux
  • 3,841
  • 4
  • 40
  • 80
  • "I don't have the choice and I must deal with all those elements." Any way you could explain why it "has" to be done this way? Maybe theres a different way that could be found. – TNguyen Oct 04 '18 at 00:07
  • Because it's for a class project, a Search Engine, meant to be developed using either Java or C#, and I decided to do it in Swift. The same implementation works perfectly in Java but Swift doesn't seem to like big dictionaries. – Scaraux Oct 04 '18 at 00:09
  • Dictionaries use hashes for key lookup, and should give nearly `O(1)` (constant time) performance. It doesn't really make sense that it would get "very very slow". As Palle suggests in his answer, you should use instruments to time-profile your code. – Duncan C Oct 04 '18 at 00:20
  • I know ! But with 36000 files containing each 250 words of 10 letters, O(1) is still forever... – Scaraux Oct 04 '18 at 00:21
  • I will try to profile – Scaraux Oct 04 '18 at 00:21
  • _"But with 36000 files containing each 250 words of 10 letters"_ The code you posted doesn't have any reference to that. It might be building the dictionary that is taking that long. – Rakesha Shastri Oct 04 '18 at 05:30
  • I know, but no its precisely the append that takes long. I know why, Swift Arrays are terrible, they reallocate all the time... – Scaraux Oct 04 '18 at 05:31

3 Answers3

1

The general approach when your code is too slow is to profile it in Instruments to figure out, which lines actually take the longest and to go from there. There could be bottlenecks somewhere else, etc.. Running your app directly from within Xcode also creates a debug build, which sacrifices performance for debuggability. A release build may perform much better.

Also, if your program takes up large amounts of memory, the system may struggle to make this memory available to your app. On non-iOS platforms, this will lead to swapping out memory to disk, which will impact the performance of your app significantly, as the system cannot anticipate, which elements of the dictionary will be accessed next.

If the memory requirements are not responsible for the slowdown, here are a few approaches that I would try:

  • If you can estimate the number of items that you want to insert into a dictionary, you can use dictionary.reserveCapacity(numberOfItems). As the dictionary grows, it may need to be resized, which may require rebuilding the hash table, which the dictionary type uses internally. This approach also works for arrays.

  • Swift provides methods to automatically group items into a dictionary using a common key: Dictionary(grouping: collection, by: { item in item.property }). This approach may be computationally more efficient, as everything could be processed in one batch.

  • Another approach may be to use other data types, such as a tree map, which would not require frequent reallocations. Swift does however not provide such a type in the standard library.

Palle
  • 11,511
  • 2
  • 40
  • 61
1

This is fairly common performance trap, as also observed in:

The issue stems from the fact that the array you're mutating in the expression self.map[term]!.append(...) is a temporary mutable copy of the underlying array in the dictionary's storage. This means that the array is never uniquely referenced and so always has its buffer re-allocated.

This situation will fixed in Swift 5 with the unofficial introduction of generalised accessors, but until then, one solution (as mentioned in both the above Q&As) is to use Dictionary's subscript(_:default:) which from Swift 4.1 can mutate the value directly in storage.

Although your case isn't quite a straightforward case of applying a single mutation, so you need some kind of wrapper function in order to allow you to have scoped access to your mutable array.

For example, this could look like:

class X {

  private var map: [String: [Posting]] = [:]

  private func withPostings<R>(
    forTerm term: String, mutations: (inout [Posting]) throws -> R
  ) rethrows -> R {
    return try mutations(&map[term, default: []])
  }

  func addTerm(_ term: String, withId id: Int, atPosition position: Int) {

    withPostings(forTerm: term) { postings in
      if let posting = postings.last, posting.documentId == id {
        posting.addPosition(position)
      } else {
        postings.append(Posting(withId: id, atPosition: position, forTerm: term))
      }
    }

  }
  // ...
}
Hamish
  • 78,605
  • 19
  • 187
  • 280
0

I had the same problem. It was crazy slow for 200K entries... so I made a class and put the array in there...

class MyIndex
{
    var entries: [Posting]
}

var map = [String: MyIndex]()

seems to work pretty fast now