In this answer, I am creating a dict
of empty set
s by iterating through a list. Then I iterate through the same list & filling those sets. An MRE:
# imports we need
import time
import numpy as np
np.random.seed(42)
What I am doing
An example list of letters. Note that at least one letter will appear more than once.
letters=[np.random.choice([letter for letter in string.ascii_lowercase]) for _ in range(1000)]
Result:
['w', 'n', 'k', 'o', 'm', 'r', ...
Creating a dict
with letters as keys, empty sets as values:
letterdict={letter:set() for letter in letters}
Iterating through the letters
list again, each entry in the list with the corresponding letter will be a set where the indices of that letter appears in the letters
list:
for index, letter in enumerate(letters):
letterdict[letter].add(index)
letterdict
will look like:
{'w': {0, 12, 62, 67, 69, ...
How long it was
This process took:
start = time.time()
letterdict={letter:set() for letter in letters}
for index, letter in enumerate(letters):
letterdict[letter].add(index)
end = time.time()
print(end-start)
0.000538...
sec.
The question
Is there a way to make the creation of letterdict
quicker? Afterall, I am iterating through letters
twice.
My thoughts: If I could make it in one loop, when it encounters a letter for the first time, I could create a set
, and put the index of the letter in it. When encountering the letter for the second time, it could not reset the set
, just add the index. However, checking that a letter is already encountered or not seems tedious (ie slow).
In the MRE, assume, that we do not know what all the letters are, so replacing the first loop by {letter:set() for letter in string.ascii_lowercase}
is not really useful.