3

I have a directory of CSV data files and I load all of them in one line using pandas.read_csv() within a list comprehension statement.

import glob
import pandas as pd
file_list = glob.glob('../data/')
df_list = [pd.read_csv(f) for f in file_list]
df = pd.concat(df_list, ignore_index=True)

Now I want to print the file path every time when it loads a data file, but I cannot find a way to use multiple statements in list comprehension. For example, something like [pd.read_csv(f); print(f) for f in file_list] will cause a SyntaxError.

The closest thing I can get is to let print() to return None in an if-statement, which works like a pass after printing.

df_list = [pd.read_csv(f) for f in file_list if print(f) is None]

Is there a proper way of doing this? I like list comprehension for its conciseness, but it does not seem to allow multiple statements.

Takeshi
  • 120
  • 1
  • 6

3 Answers3

2

List comprehension was not designed for that. Rather, just for populating a list looping over some iterable, and (optionally) if a condition is met. Python likes to emphasise readability over lines of code.

The proper way to do what you want is to not use list comprehension at all, rather a for loop:

for f in file_list:
    print(f)
    df_list.append(pd.read_csv(f))
Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
Mayazcherquoi
  • 474
  • 1
  • 4
  • 12
2

If you want a list comprehension (understandable given the speed improvement over a for loop), you could slightly modify your solution because None is falsy:

df_list = [pd.read_csv(f) for f in file_list if not print(f)]

Alternatively make a function that does the work:

def read_and_print(f):
    print(f)
    return pd.read_csv(f)

df_list = [read_and_print(f) for f in file_list]

However, the approaches violate the Command–query separation principle that Python generally follows because the function has both a side-effect and a return value of interest. Nonetheless, I think these are quite pragmatic, particularly if you want to print() now to view the data, but later you plan to remove the print() calls.

Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
1

As already noted, you should generally not use functions with side-effects in a list comprehension. However, I appreciate that for debugging purposes something like this might be useful.

One way, similar to your if condition, would be to use or, making use of the fact that the print function returns None and thus evaluating and returning the second operator:

df_list = [print(f) or pd.read_csv(f) for f in file_list]

But this might be difficult to understand and the intention not very clear. Alternatively, you could define a peek function that prints and returns the argument and use that in the comprehension:

def peek(x, *args, **kwargs):
    print(x, *args, **kwargs)
    return x

df_list = [pd.read_csv(peek(f)) for f in file_list]

You could also make this more generic, passing the function to be applied (print in this case) as another parameter to the peek function, or first checking whether some global debug_enabled variable is actually set to True.

tobias_k
  • 81,265
  • 12
  • 120
  • 179