Is it possible to split a sequence of pandas commands across multiple lines?

Question

I have a long string of pandas chained commands, for example:

df.groupby[['x','y']].apply(lambda x: (np.max(x['z'])-np.min(x['z']))).sort_values(ascending=False)

And I would like to be able to present it across multiple lines but still as a one liner (without saving results to a temporary object, or defining the lambda as a function)

an example of how I would like it to look:

df.groupby[['x','y']]
.apply(lambda x: (np.max(x['z'])-np.min(x['z'])))
.sort_values(ascending=False)

Is it possible to do so? (I know '_' has this functionality in python, but it doesn't seem to work with chained commands)

score 46 · Accepted Answer · edited Nov 03 '17 at 11:12

46

In python you can continue to the next line by ending your line with a reverse slash or by enclosing the expression in parenthesis.

df.groupby[['x','y']] \
.apply(lambda x: (np.max(x['z'])-np.min(x['z']))) \
.sort_values(ascending=False)

or

(df.groupby[['x','y']]
.apply(lambda x: (np.max(x['z'])-np.min(x['z'])))
.sort_values(ascending=False))

edited Nov 03 '17 at 11:12

Dan

45,079
17
88
157

answered Nov 26 '15 at 17:53

GaryBishop

3,204
2
23
19

1

while this is correct syntax, it is rather unpythonic, as it makes code very hard to read. – cel Nov 26 '15 at 17:56
I agree, I don't like either option. – GaryBishop Nov 26 '15 at 18:06
I understand and agree that for code reading this is not the best option, however when presenting the code in a slide I think this would be more clear than breaking it into additional lines. Do you think otherwise? – user2808117 Nov 27 '15 at 08:56
In a slide it seems perfect. I code like that in Javascript all the time. In Python, I generally avoid the reverse slash form. Parenthesis used with indentation can be fine, IMO. – GaryBishop Nov 27 '15 at 12:28
For assigning to a new DataFrame: `df_cities = (df.groupby(['place_name'])['agent_count'] .sum() .reset_index() .sort_values(by='agent_count', ascending=False))` – Jesse Feb 24 '19 at 14:08

Zoli · Answer 2 · 2017-05-07T23:03:10.977

The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation

from https://www.python.org/dev/peps/pep-0008/#id19

So may be better:

df.groupby[['x', 'y']].apply(
    lambda x: (np.max(x['z'])-np.min(x['z']))
).sort_values(ascending=False)

The last printed expression variable "_" is known only in the Python console, so without explicit attribution cannot be used for that purpose in a script/module.

I think this should be accepted answer since it's based on a widely accepted standard rather than on an opinion on what looks nicer — Liz, Apr 10 '19 at 17:55

Matthias Fripp · Answer 3 · 2021-05-20T18:12:44.697

Since this has the nature of a command, I would probably format it close to your example, like this:

df.groupby[['x','y']] \
    .apply(lambda x: np.max(x['z'])-np.min(x['z'])) \
    .sort_values(ascending=False)

It took me a long time to realize I could break these expressions before the dots, which is often more readable than breaking inside the parentheses (same goes for "some long string".format()).

If this were more like an expression evaluation, I'd wrap the whole thing in parentheses, which is considered more "Pythonic" than line continuation markers:

var = (
    df.groupby[['x','y']]
        .apply(
            lambda x: np.max(x['z'])-np.min(x['z'])
        ) 
        .sort_values(ascending=False)
)

Update Since writing this, I've moved away from backslashes for line continuation whenever possible, including here, where it's not meaningful to chain the operations without assigning it to a variable or passing it to a function. I've also switched to using one level of indentation for each level of nesting inside parentheses or brackets, to avoid going to deep and/or getting a wiggly effect. So I would now write your expression like this:

 var = (
    df
    .groupby[['x','y']]
    .apply(
        lambda x: np.max(x['z']) - np.min(x['z'])
    ) 
    .sort_values(ascending=False)
)

This is a great explanation. Do you have a good heuristic for remembering to put the periods at the beginning of the line instead of at the end? As I understand it works both ways, but I assume one is more common (and with good reason). — emem_tee, Jun 14 '23 at 20:28
@emem_tee I've just settled into a habit of using parentheses around expressions and breaking them just _before_ operators. I find operators at the end of lines tend to be hard to see, but they are easy to catch at the start. The `.` is a kind of operator (that's the key insight here), and I end up putting it at the start of the line for the same reason as other operators. Wrapping expressions inside parentheses is an established and preferred Python technique. Putting the dots at the start seems to be up for debate but [somewhat preferred](https://stackoverflow.com/a/7942617/). — Matthias Fripp, Jun 15 '23 at 23:51

Is it possible to split a sequence of pandas commands across multiple lines?

3 Answers3

Linked