How to print the progress of a list comprehension in python?

Question

In my method i have to return a list within a list. I would like to have a list comprehension, because of the performance since the list takes about 5 minutes to create.

[[token.text for token in document] for document in doc_collection]

Is there a possibility to print out the progress, in which document the create-process currently are? Something like that:

[[token.text for token in document] 
  and print(progress) for progress, document in enumerate(doc_collection)]

Thanks for your help!

@KlausD. For sure this would work, but is there no possibility to add it in the comprehension? Thanks anyway! — rakael, Jun 08 '18 at 08:06
@KlausD. a `for` loop is way slower than `list comprehension` when creating lists — Gsk, Jun 08 '18 at 08:06
@Chris_Rands Nice find. But I think this question is the better one (shorter and clearer, and no `pandas` usage), so it might be better to close the older question as a dupe of this one. We just need a "Don't do that; use a for loop instead" answer, then we're all set. — Aran-Fey, Jun 08 '18 at 08:16
@Aran-Fey "But I think this question is the better one " okay so why it has only one upvote (which isn't yours?) — Jean-François Fabre, Jun 08 '18 at 08:18
@Aran-Fey I see your point, but I answered the other question with both the `print() or` and side function ideas and a bit more explanation than given in these answers, but I guess I'm biased *sigh* — Chris_Rands, Jun 08 '18 at 08:19
@Chris_Rands you got some votes for that recently though ? :) — Jean-François Fabre, Jun 08 '18 at 08:21
@Chris_Rands I do think that the answers in the older question are better; I just don't like the verbosity (and the `pandas` usage) of the question. This one here is easier to understand for a wider audience. Given the circumstances, I think nobody would blame you for re-posting your answer here :) — Aran-Fey, Jun 08 '18 at 08:27
@Aran-Fey I guess the alternative would be to improve the other question by striping it back to the essential parts — Chris_Rands, Jun 08 '18 at 08:55

ted · Accepted Answer · 2021-03-06T00:24:16.077

49

`tqdm`

Using the tqdm package, a fast and versatile progress bar utility

pip install tqdm

from tqdm import tqdm

def process(token):
    return token['text']

l1 = [{'text': k} for k in range(5000)]
l2 = [process(token) for token in tqdm(l1)]

100%|███████████████████████████████████| 5000/5000 [00:00<00:00, 2326807.94it/s]

No requirement

1/ Use a side function

def report(index):
    if index % 1000 == 0:
        print(index)

def process(token, index, report=None):
    if report:
        report(index) 
    return token['text']

l1 = [{'text': k} for k in range(5000)]

l2 = [process(token, i, report) for i, token in enumerate(l1)]

2/ Use `and` and `or` statements

def process(token):
    return token['text']

l1 = [{'text': k} for k in range(5000)]
l2 = [(i % 1000 == 0 and print(i)) or process(token) for i, token in enumerate(l1)]

3/ Use both

def process(token):
    return token['text']

def report(i):
    i % 1000 == 0 and print(i)

l1 = [{'text': k} for k in range(5000)]
l2 = [report(i) or process(token) for i, token in enumerate(l1)]

All 3 methods print:

How 2 works

i % 1000 == 0 and print(i): and only checks the second statement if the first one is True so only prints when i % 1000 == 0
or process(token): or always checks both statements, but returns the first one which evals to True.
- If i % 1000 != 0 then the first statement is False and process(token) is added to the list.
- Else, then the first statement is None (because print returns None) and likewise, the or statement adds process(token) to the list

How 3 works

Similarly as 2, because report(i) does not return anything, it evals to None and or adds process(token) to the list

edited Mar 06 '21 at 00:24

answered Jun 08 '18 at 08:08

ted

13,596
9
65
107

2

instead of using a `global i`, I would go with `enumerate` and pass the `index` to the `function` – Ma0 Jun 08 '18 at 08:14
1

@Ev.Kounis and factor out the reporting part using a callback, too (code edited to fix both points). – bruno desthuilliers Jun 08 '18 at 08:35
This is both slower and less readable than a `for` loop. In my (admittedly very limited) tests, Alex's solution takes 10 seconds, a `for` loop takes 13, and this one takes 17. – Aran-Fey Jun 08 '18 at 08:37
@Aran-Fey The functionality is different though; one cannot compare them directly. – Ma0 Jun 08 '18 at 08:42
@Ev.Kounis Huh? What's different? The result is the same, as far as I can tell... – Aran-Fey Jun 08 '18 at 08:43
You can turn the reporting on and off and there is an `if` statement for the `print`s. Alex's answer just `print`s everything. Not sure how your `for` loop looks like. – Ma0 Jun 08 '18 at 08:45
This is the standard method. But you really, really should use `if index % 1000 == 0 and index > 0:` in the test - it is much cleaner. :) – Björn Lindqvist Mar 15 '19 at 01:23
@ted, why do you pass `tok` to `report`? The function does nothing with this argument, so imo it can be omitted. – Qaswed Aug 19 '19 at 12:27
This is true, it's more of a display for how it could be used – ted Aug 20 '19 at 21:02

score 4 · Answer 2 · answered Jun 08 '18 at 08:07

4

doc_collection = [[1, 2],
                  [3, 4],
                  [5, 6]]

result = [print(progress) or
          [str(token) for token in document]
          for progress, document in enumerate(doc_collection)]

print(result)  # [['1', '2'], ['3', '4'], ['5', '6']]

I don't consider this good or readable code, but the idea is fun.

It works because print always returns None so print(progress) or x will always be x (by the definition of or).

answered Jun 08 '18 at 08:07

Alex Hall

34,833
5
57
89

2

This should NOT be the accepted answer - as far as I'm concerned, such a code will not pass a code review. Ted's solution is the correct way to solve the problem. – bruno desthuilliers Jun 08 '18 at 08:33

score 4 · Answer 3 · answered Aug 04 '21 at 09:08

4

Just do:

from time import sleep
from tqdm import tqdm

def foo(i):
    sleep(0.01)
    return i

[foo(i) for i in tqdm(range(1000))]

For Jupyter notebook:

from tqdm.notebook import tqdm

answered Aug 04 '21 at 09:08

noyk

71
5

1

Consider using `from tqdm import trange` Then `[foo(i) for i in trange(1000))]`. It is a shortcut for `tqdm(range(N))` – Elijas Dapšauskas Apr 01 '23 at 20:01

score 2 · Answer 4 · answered Jun 08 '18 at 10:02

def show_progress(it, milestones=1):
    for i, x in enumerate(it):
        yield x
        processed = i + 1
        if processed % milestones == 0:
            print('Processed %s elements' % processed)

Simply apply this function to anything you're iterating over. It doesn't matter if you use a loop or list comprehension and it's easy to use anywhere with almost no code changes. For example:

doc_collection = [[1, 2],
                  [3, 4],
                  [5, 6]]

result = [[str(token) for token in document]
          for document in show_progress(doc_collection)]

print(result)  # [['1', '2'], ['3', '4'], ['5', '6']]

If you only wanted to show progress for every 100 documents, write:

show_progress(doc_collection, 100)

score 2 · Answer 5 · answered Aug 05 '20 at 02:19

2

Here is my implementation.

pip install progressbar2

from progressbar import progressbar
new_list = [your_function(list_item) for list_item in progressbar(old_list)]`

You will see a progress bar while running the code block above.

answered Aug 05 '20 at 02:19

James Chang

608
8
21

score 0 · Answer 6 · answered Aug 19 '19 at 12:56

I have the need to make @ted's answer (imo) more readable and to add some explanations.

Tidied up solution:

# Function to print the index, if the index is evenly divisable by 1000:
def report(index):
    if index % 1000 == 0:
        print(index)

# The function the user wants to apply on the list elements
def process(x, index, report):
     report(index) # Call of the reporting function
     return 'something ' + x # ! Just an example, replace with your desired application

# !Just an example, replace with your list to iterate over
mylist = ['number ' + str(k) for k in range(5000)]

# Running a list comprehension
[process(x, index, report) for index, x in enumerate(mylist)]

Explanation: of enumerate(mylist): using the function enumerate it is possible to have indices in addition to the elements of an iterable object (cf. this question and its answers). For example

[(index, x) for index, x in enumerate(["a", "b", "c"])] #returns
[(0, 'a'), (1, 'b'), (2, 'c')]

Note: index and x are no reserved names, just names I found convenient - [(foo, bar) for foo, bar in enumerate(["a", "b", "c"])] yields the same result.

How to print the progress of a list comprehension in python?

6 Answers6

`tqdm`

No requirement

1/ Use a side function

2/ Use `and` and `or` statements

3/ Use both

Linked

Related

How to print the progress of a list comprehension in python?

6 Answers6

tqdm

No requirement

1/ Use a side function

2/ Use and and or statements

3/ Use both

Linked

Related

`tqdm`

2/ Use `and` and `or` statements