-2

Question: how can I get the last item in a python generator in a fast and memory-efficient way?

MWE:

import snscrape.modules.twitter as sntwitter
import time; start = time.time()

query = "Rooty Roo"

obj = sntwitter.TwitterSearchScraper(query)
print(obj) # didn't see much useful besides get_items

cnt = 0
items = obj.get_items()
for item in items:
  cnt += 1
  if cnt % 100 == 0:
    print(cnt)
  # end if
# end for
## the above seems ideal for memory-efficiency but 
## maybe super slow as I have no idea if there are 
## millions or billions of tweets in there. 
## Been running a few minutes and at ~17k so far.
## Not super ideal for playing around...

print(vars(tweet))
print("tweets: ", cnt)
print("executed in: ", time.time() - start)

I guess the above is not a super MWE since it relies on a package, but this is the first time I've encountered a generator. And is what prompted this question :)

Context: I'm trying to learn more about how this package works. I started reading the source but thought playing around and inspecting the data might be faster ¯\(ツ)

Memory-Efficient Context: my laptop is turning 10 this year and I think part of the RAM is failing. Theoretically it has 8 GB RAM but using more than 1-2 GB causes browser pages to crash :D

Is this question answered already? Probably, but google search results for 'python get last item of a generator' return results for iterators...

Ryan Farber
  • 343
  • 1
  • 4
  • 11
  • A generator is just a type of iterator, so results you find that talk about iterators should work for this too. – Samwise May 21 '22 at 14:33
  • 1
    The last item in a generator might not even exist. For example, consider a generator that generates all natural numbers - there's no way to get the last one. For the example above, you could just exhaust the generator with `items = list(obj.get_items())` (if it is a generator) and then get `len(items)` – Grismar May 21 '22 at 14:37
  • Do the answers to this [question](https://stackoverflow.com/questions/2138873/cleanest-way-to-get-last-item-from-python-iterator) help at all? – quamrana May 21 '22 at 14:37
  • You code won't even run, where is `tweet` defined? – Ξένη Γήινος May 21 '22 at 14:43
  • @Thyebri sure it'll run since this is python but yeah it'll crash (I should change tweet to item... – Ryan Farber May 22 '22 at 14:23
  • @Grismar okay, thanks! Didn't realize generators could be infinite, that is interesting. – Ryan Farber May 22 '22 at 14:28
  • @Samwise and quamrana okay if generators are the same as iterators I guess the link shared works (I had seen that previously but didn't know it applied). So should I close/remove this question do you all think? – Ryan Farber May 22 '22 at 14:28
  • 1
    From an answer to this [question](https://stackoverflow.com/questions/2776829/difference-between-pythons-generators-and-iterators): `"Every generator is an iterator, but not vice versa."` – quamrana May 22 '22 at 14:31

1 Answers1

1

The last item of a generator cannot (always) be determined.

Of some generators you cannot predict if they'll ever end (or the last element is uncertain):

import random

def random_series():
    while x := random.randint(1, 10) > 1:
        yield x


# print random numbers from generator until 1 is generated
for x in random_series():
    print(x)

Others will literally go on forever:

def natural_numbers():
    n = 0
    while True:
        n += 1
        yield n

# prints the first 10 natural numbers, but could go on forever
g = natural_numbers()
for _ in range(10):
    print(next(g))

However, every generator is an iterator, and you can try to get the last item (or the number of items) the same way you can for any other iterator that doesn't flat out tell you, or allow indexing.

For iterators that do:

# if i is some iterator that allows indexing and has a length:
print('last element: ', i[-1])
print('size: ', len(i))

For iterators that don't (but at least end):

print('last element: ', list(i)[-1])
print('size: ', len(list(i)))

However, if you try that on an infinite generator, your code will hang, or more likely crash as soon as it runs out of memory to put the list into. Also, note that every time you call list(i), it will construct a new list, so if you need that list multiple times, you may want to assign the result to a variable to save time.

In your case:

items = list(obj.get_items())
print("tweets: ", len(items))
print("last tweet: ", items[-1])

Note: as user @kellybundy points out, creating a list is not very memory-efficient. If you don't care about the actual contents, other than the last element, this would work:

for n, last in enumerate(obj.get_items()):
    pass
# n will be the number of items - 1 and last will be the last item

This is memory-efficient, but the contents of the generator are now lost.

Grismar
  • 27,561
  • 4
  • 31
  • 54
  • 1
    They asked for "memory-efficient", and `list(i)` is rather the opposite of that. – Kelly Bundy May 22 '22 at 20:40
  • 1
    I suppose that's fair enough - I added a solution that gets the required answer (but throws everything else out). Note however that OP is getting some tweets from an API, so 1-2GB should more than suffice. In general however, you certainly have a point. – Grismar May 22 '22 at 21:21