0

it’s a Python newbie question (thanks to the post Import multiple excel files into python pandas and concatenate them into one dataframe)

The script is:

import os

files = os.listdir('C:\\TEST')
files_pdf = [f for f in files if f[-3:] == 'pdf']
print files_pdf

it give all the names of PDF files in the folder.

I am trying to understand it from a basic way. I guess the longest line above functions as:

files_ pdf = []
for f in files:
    if f[-3:] == ‘pdf’:
        files_ pdf.append(f)

the question is that, what's the difference? and what’s the reason or principle of the ‘f for f in files’?

[for f in files if f[-3:] == 'pdf']   #doesn't work
[f for f in files if f[-3:] == 'pdf'] #works

thanks.

Community
  • 1
  • 1
Mark K
  • 8,767
  • 14
  • 58
  • 118

1 Answers1

2

This is called List Comprehensions

for example,

nums = [1, 2, 3, 4, 5]
squares = [x**2 for x in nums]
# squares: [1, 4, 9, 16, 25]

The first part (before for) is what will be push into the new list.

So

[f for f in files if f[-3:] == 'pdf']

just means "use all f in files such that f[-3:] == 'pdf' to form a list"

There are some really useful trick about list comprehensions.
for example, if you want to create a list contain some empty list for later use, you can write

multi_list = [[] for i in range(10)]   # correct
multi_list = [[]]*10                   # wrong!

this is because [[]]*10 will use the same reference of one list,
but [[] for i in range(10)] will create 10 independent list instead.

for more information, you should check List Comprehensions

LeoMao
  • 36
  • 1