2

I am new to Pandas and see that there are numerous ways to change column headers. For example, the set_axis command works like this :

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.arange(3),columns=['a'])
>>> df
   a
0  0
1  1
2  2
>>> df["a"][0]
0
>>> df.set_axis(['A'],axis=1,inplace=True)
>>> df
   A
0  0
1  1
2  2
>>> df["A"][0]  
0

EDIT : Alternatively, one can use

df.columns = ['A']

to change the column name.

But now, it seems that if I want to change the column header for display purposes only (because the header label is inconvenient to use as a dictionary key), I have to create an entirely new data frame :

>>> df_pretty = df.set_axis(['Long label (%)'],axis=1,inplace=False)
df_pretty
   Long label (%)
0               0  
1               1
2               2

Is this right? Or am I missing something? It seems a waste of memory to have to recreate a new data frame just for printing. I would have thought that Pandas would have a way to store an internal "key" and a separate column label, used only for display purposes.

Donna
  • 1,390
  • 1
  • 14
  • 30
  • 1
    Assign to `df.columns`. – cs95 Jan 13 '18 at 20:08
  • No, pandas doesn't have "labels" for variables. Both labels and variable names are the same thing. There are some feature requests but I wouldn't expect it to happen before version 2. – ayhan Jan 13 '18 at 20:09
  • 1
    You can always refer to columns by their integer index value, using `.iloc`, regardless of the labels you assign. – andrew_reece Jan 13 '18 at 20:10
  • 2
    See the [milestone](https://github.com/pandas-dev/pandas/issues/11179). Someday. :) – ayhan Jan 13 '18 at 20:12
  • @ayhan Thanks - I just posted a comment on that issue [here](https://github.com/pandas-dev/pandas/issues/11179). I was surprised at some of the reasons for not including this feature. Apparently, the Pandas developers like short column names in their tables. Too bad not everyone is a Pandas developer! – Donna Jan 13 '18 at 20:49

2 Answers2

6

The solution posted by @JohnE looks like the best way to go.

I also would like to use a format string, and so add a few more details here :

import pandas
df = pandas.DataFrame({'a' : [1,2,3],'b' : [4,5,6]})
di = {'a' : 'A (J/K*kg)', 'b' : 'B (N/m^2)'}
fstr = {di["a"] : '{:6.2f}', di["b"]:'{:5.2e}'}
df.rename(columns=di).style.format(fstr) 

This, rendered in a Juptyer notebook, looks perfect, and does exactly what I would want.

enter image description here

When I tried the same code at the Python prompt, though, the styler doesn't render.

>>> import pandas
>>> df = pandas.DataFrame({'a' : [1,2,3],'b' : [4,5,6]})
>>> di = {'a' : 'A (J/K*kg)', 'b' : 'B (N/m^2)'}
>>> fstr = {di["a"] : '{:6.2f}', di["b"]:'{:5.2e}'}
>>> df.rename(columns=di).style.format(fstr)
<pandas.io.formats.style.Styler object at 0x105812eb8>

EDIT : In interactive mode, (not in a Jupyter notebook), HTML formatting doesn't display, and it seems that Pandas does not have a basic ascii output style for tables.

Donna
  • 1,390
  • 1
  • 14
  • 30
  • I have no idea either. I use the jupyter qtconsole interface myself which has the same output as you show with the plain python prompt. I'd recommend just posting a new question and be sure to use a "jupyter" tag – JohnE Jan 14 '18 at 13:58
  • @JohnE I think that this `style` object renders only to html, and possible latex (something I'd like to figure out how to do). – Donna Jan 14 '18 at 16:36
3

If you first set up a dictionary for converting from short names to long names:

di = {'a':'long name for a'}

Then it's really easy to use rename to display the long names whenever you want:

df.rename(di,axis=1)

   long name for a
0                0
1                1
2                2

Note that this is just for one column, but once you set up the dictionary the syntax is just as concise for 100 columns as it is for 1.

You also don't have to make any permanent changes this way. Just add the rename method whenever you want to display things differently. Or alternatively, store the long names in the permanent dataframe and just use a dictionary to display the short names as needed.

Honestly, I don't think this is any harder than if the labels were stored as column metadata since even then you'd often want to specify short or long names explicitly and would need some sort of keyword argument for that. And also, because python's dictionaries are so flexible, you have tons of options here: you could have short, medium, long names stored as dictionaries, and set up functions to automatically create short names from long names, etc.

JohnE
  • 29,156
  • 8
  • 79
  • 109