1

Here is my dataframe:

In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'col1':['A','A','A','B','B','B'], 'col2':['C','D','D','D','C','C'], 
            'col3':[.1,.2,.4,.6,.8,1]})
In [3]: df
Out[4]: 
  col1 col2  col3
0    A    C   0.1
1    A    D   0.2
2    A    D   0.4
3    B    D   0.6
4    B    C   0.8
5    B    C   1.0

My question is: When I want to wrap the long text, where shall I put the backslash? After the dot or before the dot? Which is correct?

# option 1 backslash after dot or comma
df.groupby('col1').\
    sum()
df['col1'],\
    df['col2']

# option 2 backslash before dot or comma
df.groupby('col1')\
    .sum()
df['col1']\
    ,df['col2']

I also find that if I use parentheses I do not have to use the backslash. Then which option is correct?

# option 1: no backslash and dot or comma in the new line
(df.groupby('col1')
    .sum())
(df['col1']
    ,df['col2'])

# option 2: no backslash and dot or comma in the old line
(df.groupby('col1').
    sum())
(df['col1'],
    df['col2'])

# option 3: backslash after dot or comma 
(df.groupby('col1').\
    sum())
(df['col1'],\
    df['col2'])

# option 4: backslash before dot or comma 
(df.groupby('col1')\
    .sum())
(df['col1']\
    ,df['col2'])
leomax
  • 13
  • 3

1 Answers1

2

Explanation

PEP8 prefers usage of brackets over backslashes where it's possible.

PEP8 doesn't say anything about dots or commas needing to be on the same line as the expression (though it's like that in every example given).

Solution

The technically correct answers would be:

# option 1: no backslash and dot or comma in the new line
(df.groupby('col1')
    .sum())
(df['col1']
    ,df['col2'])

# option 2: no backslash and dot or comma in the old line
(df.groupby('col1').
    sum())
(df['col1'],
    df['col2'])

Though one could argue that the recommendation against space between a trailing comma and closing parenthesis is the exception that proves the rule, which would mean that: ,df['col2'] doesn't follow the standards (though , df['col2'] still does).

Even though the 2 options presented above are technically correct, the following is the most commonly used:

(df.groupby('col1')
    .sum())
(df['col1'],
    df['col2'])

Note: the indentation depends in the context it's used. The examples above shouldn't be used as reference. Also remember that PEP8 is just a guideline, there are many scenarios where the rules should be broken in order to improve readability.

Nemanja Mirić
  • 254
  • 1
  • 6
  • Thank you for your answer. According to https://stackoverflow.com/questions/53162/how-can-i-do-a-line-break-line-continuation-in-python, I shall put '+' in the first line. Does it means: it is better to put ',' and operator in first line. Put other symbol in the second line? – leomax May 09 '20 at 03:12
  • [No, you shouldn't put `+` on the first line](https://www.python.org/dev/peps/pep-0008/#should-a-line-break-before-or-after-a-binary-operator). Placing `+` on the second line is preferred by PEP8. That means that only commas (`,`) stay on the first line, while binary operators and member of object access operator (`.`) go on the second line. I can't say much for 'other symbols', as it really depends on the context. – Nemanja Mirić May 09 '20 at 06:49