5

I have a data frame:

df = pd.DataFrame(data=[[1,2]], columns=['a', 'b'])

I'm aware I can do the following to change all column names in a dataframe:

df.columns = ['d', 'e']

How can I change all column names in a chained operation? For example, I would like to do something like:

df=(
    df.rename all column names
    .reset_index()
)

The only way I can find is to use df.rename and build a dictionary with the old and new column pairs but that looks very ugly. Are there any more elegant solutions?

Thanks.

Allen Qin
  • 19,507
  • 8
  • 51
  • 67
  • Not sure if it is similar to https://stackoverflow.com/a/16667215/5916727 – niraj Feb 11 '18 at 21:38
  • Not quite. I want to give columns completely new names. There's no need to reference old names. – Allen Qin Feb 11 '18 at 21:41
  • How do you intend to map old_col_names -> new_col_names without any of: ordinal position, dictionary map, or lambda function? – jpp Feb 11 '18 at 21:46
  • That's my question. I just want to give all columns complete new names. There's no relation or mapping to the old names. – Allen Qin Feb 11 '18 at 21:48
  • Ah so the only issue is chaining it? – jpp Feb 11 '18 at 21:52
  • That's right df.columns = ... does what I need but I need to do this in chained operations. df.columns = ... does not return the df so can't be used in chained operations. – Allen Qin Feb 11 '18 at 21:55
  • Note that long chained operations is [consider by some experts](https://stackoverflow.com/a/2443559/190597) a [code smell](https://en.wikipedia.org/wiki/Law_of_Demeter). – unutbu Feb 11 '18 at 21:56
  • Nevertheless, you could use `df.rename(columns=dict(zip(df.columns, ['d','e']))`. – unutbu Feb 11 '18 at 21:58
  • Thanks unutbu but I'm only chaining 2 operations here – Allen Qin Feb 11 '18 at 22:00
  • I personally despise the yearning for one-liners; for readability, i would write a function `def renamer(df, cols); df.columns=cols; return df`. then call `renamer(df)` in the chaining. – jpp Feb 11 '18 at 22:00
  • @unutbu I'm aware of this option and I think it's not very elegant. If I can do df.rename(columns=['d','e']), that'll be perfect. – Allen Qin Feb 11 '18 at 22:02
  • Then you could do as @jp_data_analysis suggests, or just bite the bullet and write the somewhat long-winded `df.rename(columns=dict(zip(df.columns, ['d','e']))`. You could raise a [github issue](https://github.com/pandas-dev/pandas/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+rename+columns) for `df.rename(columns=['d','e'])`, but at the moment that is not part of the Pandas API. – unutbu Feb 11 '18 at 22:02
  • There is already a github issue proposing `df.rename(columns=['d','e'])` : https://github.com/pandas-dev/pandas/issues/14829 – unutbu Feb 11 '18 at 22:06
  • @jp_data_analysis I find chained operation makes code more organised. See one of Pandas developer's blog https://tomaugspurger.github.io/method-chaining – Allen Qin Feb 11 '18 at 22:07
  • @unutbu it's great it's already been raised and worked on. Thanks for the info. Looks like at the moment there's no simpler way of doing it. – Allen Qin Feb 11 '18 at 22:12
  • @Allen: You could use @jp_data_analysis's idea (`def renamer...`) and then do `df.pipe(renamer)` -- similar to what tomaugspurger does in the blog. – unutbu Feb 11 '18 at 22:15
  • @unutbu, it seems there's an answer in the comments of the issue you mentioned. – Allen Qin Feb 11 '18 at 22:36

1 Answers1

9

Thanks to @unutbu for pointing to a git hub issue, it turns out this can be done via set_axis from one of the comments there:

df = pd.DataFrame(data=[[1,2]], columns=['a', 'b'])   
df
Out[21]: 
   a  b
0  1  2

df2 = (
    df.set_axis(['d','e'], axis=1, inplace=False)
    .reset_index()
)
df2
Out[18]: 
   index  d  e
0      0  1  2
Allen Qin
  • 19,507
  • 8
  • 51
  • 67