7

With lengthy column names, DataFrames will display in a very messy form seemingly no matter what options are set.

Info: I'm in Jupyter QtConsole, pandas 0.20.1, with the following relevant options specified at startup:

pd.set_option('display.max_colwidth', 20)
pd.set_option('expand_frame_repr', False)
pd.set_option('display.max_rows', 25)

Question: how can I truncate the DataFrame if necessary rather than wrapping the columns to the next line, while keeping expand_frame_repr=False?

Here's an example. Again, the issue doesn't depend on the number of columns but length of the columns.

This will not cause an issue:

df = pd.DataFrame(np.random.randn(1000, 1000),
                  columns=['col' + str(i) for i in range(1000)])

As the output is perfectly readable and looks like: enter image description here

The same DataFrame with long column names causes the issue I'm talking about:

df = pd.DataFrame(np.random.randn(1000, 1000),
                  columns=['very_long_col_name_' 
                           + str(i) for i in range(1000)])

enter image description here

Is there any way to conform the second output to be like the first that I'm missing? (Through specifying an option, not through using .iloc every time I want to view.)

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
  • Hmm, this is not a problem on Ipython3. The columns are split by a `/` and subsequent columns are shifted below the first group. – cs95 Jul 11 '17 at 20:48
  • what does `pd.options.display.line_width` give you? when it's correct it usually looks OK, but in a couple of consoles that can't calculate it automatically (qtconsole was like this IIRC) it comes up as None and things don't always look great in that case... – Corley Brigman Jul 11 '17 at 20:51
  • @Coldspeed I'm guessing you have `expand_frame_repr=True`. I'd like to avoid that representation. Check your `pd.options.display.expand_frame_repr` – Brad Solomon Jul 11 '17 at 20:51
  • @Corley 80. Also that's now `pd.options.display.width` just fyi – Brad Solomon Jul 11 '17 at 20:53
  • i take that back... i hadn't tried with `expand_frame_repr`, which looks bad here even in the regular console. – Corley Brigman Jul 11 '17 at 20:53
  • Yeah, so it seems to be a catch 22. I don't particularly like the look of `expand_frame_repr` with `/` between columns, but want the option to truncate automatically. – Brad Solomon Jul 11 '17 at 20:54
  • can you try: `from pandas.io.formats import console; console.get_console_size()`. it looks like if you cast to unicode, then it gets the width from this function, else it leaves it as None and passes down. if I try e.g. `df.to_string(line_width=200)`, i get reasonable-looking output; but if I leave it out, it looks awful. edit: checking console not necessary, it's only passed if 'expand_frame_repr' is chosen. maybe better question is: what are you hoping it will look like? i think without it, it just writes single untruncated lines. – Corley Brigman Jul 11 '17 at 21:04
  • @Corley `(80, 25)`. Not sure if I'm following you here though. – Brad Solomon Jul 11 '17 at 21:09

3 Answers3

5

Use max_columns

from string import ascii_letters

df = pd.DataFrame(np.random.randint(10, size=(5, 52)), columns=list(ascii_letters))

with pd.option_context(
    'display.max_colwidth', 20,
    'expand_frame_repr', False,
    'display.max_rows', 25,
    'display.max_columns', 5,
):
    print(df.add_prefix('really_long_column_name_'))

   really_long_column_name_a  really_long_column_name_b            ...              really_long_column_name_Y  really_long_column_name_Z
0                    8                          1                  ...                                1                          9      
1                    8                          5                  ...                                2                          1      
2                    5                          0                  ...                                9                          9      
3                    6                          8                  ...                                0                          9      
4                    1                          2                  ...                                7                          1      

[5 rows x 52 columns]

Another idea... Obviously not exactly what you want, but maybe you can twist it to your needs.

d1 = df.add_suffix('_really_long_column_name')

with pd.option_context('display.max_colwidth', 4, 'expand_frame_repr', False):
    mw = pd.get_option('display.max_colwidth')
    print(d1.rename(columns=lambda x: x[:mw-3] + '...' if len(x) > mw else x))

   a...  b...  c...  d...  e...  f...  g...  h...  i...  j...  ...   Q...  R...  S...  T...  U...  V...  W...  X...  Y...  Z...
0    6     5     5     5     8     3     5     0     7     6   ...     9     0     6     9     6     8     4     0     6     7 
1    0     5     4     7     2     5     4     3     8     7   ...     8     1     5     3     5     9     4     5     5     3 
2    7     2     1     6     5     1     0     1     3     1   ...     6     7     0     9     9     5     2     8     2     2 
3    1     8     7     1     4     5     5     8     8     3   ...     3     6     5     7     1     0     8     1     4     0 
4    7     5     6     2     4     9     7     9     0     5   ...     6     8     1     6     3     5     4     2     3     2 
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks, but I would like to see if there are other solutions as well. This would require setting `pd.option_context` every time I want to print this type of long-column DataFrame, no?. (Otherwise prettier DataFrames would be unnecessarily truncated if the option was set outright, without context. – Brad Solomon Jul 11 '17 at 21:10
  • Absolutely... I can concoct another solution that doesn't alter the options if you're interested. Is displaying the entire column name necessary? – piRSquared Jul 11 '17 at 21:14
  • Good question ... I'm open to either but would lean towards preferring col names truncated at the colwidth I have set in options. – Brad Solomon Jul 11 '17 at 21:15
  • Yeah, that's what I was thinking – piRSquared Jul 11 '17 at 21:16
  • @BradSolomon Updated post – piRSquared Jul 11 '17 at 22:37
  • Thank you @piRSquared, going to leave this one open for a bit to see if any alternatives pop up – Brad Solomon Jul 12 '17 at 02:21
  • @BradSolomon np, I would too – piRSquared Jul 12 '17 at 02:22
  • I like the ideas in this answer. For ad hoc tasks I often will do: df.rename(columns=lambda x: x[:9], inplace=False), but having the max width available as in this answer I am going to update my snippet to df.rename(columns=lambda x: x[:mw], inplace=False). This assumes mw was already acquired and may fit certain use cases. – Paul Jul 15 '22 at 15:07
3

Looks like it will need an enhancement. The relevant code in the repr function appears to be here:

    max_rows = get_option("display.max_rows")
    max_cols = get_option("display.max_columns")
    show_dimensions = get_option("display.show_dimensions")
    if get_option("display.expand_frame_repr"):
        width, _ = console.get_console_size()
    else:
        width = None
    self.to_string(buf=buf, max_rows=max_rows, max_cols=max_cols,
                   line_width=width, show_dimensions=show_dimensions)

So either you pass expand_frame_repr=True and it wraps on the line width, or you pass expand_frame_repr=False and it shouldn't. But it looks like there is a bug in the code (this should be pandas 0.20.3 iirc):

in pd.io.formats.format.DataFrameFormatter:

def _chk_truncate(self):
    """
    Checks whether the frame should be truncated. If so, slices
    the frame up.
    """
    from pandas.core.reshape.concat import concat

    # Column of which first element is used to determine width of a dot col
    self.tr_size_col = -1

    # Cut the data to the information actually printed
    max_cols = self.max_cols
    max_rows = self.max_rows

    if max_cols == 0 or max_rows == 0:  # assume we are in the terminal
                                        # (why else = 0)
        (w, h) = get_terminal_size()
        self.w = w
        self.h = h
        if self.max_rows == 0:
            dot_row = 1
            prompt_row = 1
            if self.show_dimensions:
                show_dimension_rows = 3
            n_add_rows = (self.header + dot_row + show_dimension_rows +
                          prompt_row)
            # rows available to fill with actual data
            max_rows_adj = self.h - n_add_rows
            self.max_rows_adj = max_rows_adj

        # Format only rows and columns that could potentially fit the
        # screen
        if max_cols == 0 and len(self.frame.columns) > w:
            max_cols = w
        if max_rows == 0 and len(self.frame) > h:
            max_rows = h

Looks like it intended to do what you wanted, but was unfinished. It's checking max_cols against the number of columns, not the total width of the columns.

So you could either create a show_df function that would calculate the correct number of columns and show it in an option_context like pi2Squared's answer, or fix it here (and maybe submit a patch if you need it distributed).

Corley Brigman
  • 11,633
  • 5
  • 33
  • 40
2

As others have pointed out, Pandas itself seems to be bugged or badly designed here, so a workaround is required.

Most of the time this problem occurs with numerical columns, since numbers are relatively short. Pandas will split the column heading onto multiple lines if there are spaces in it, so you can "hack in" the correct behavior by inserting spaces into column headings for numerical columns when you display the dataframe. I have a one-liner to do this:

def colfix(df, L=5): return df.rename(columns=lambda x: ' '.join(x.replace('_', ' ')[i:i+L] for i in range(0,len(x),L)) if df[x].dtype in ['float64','int64'] else x )

do display your dataframe, simply type

colfix(your_df)

note that the renaming is not going to permanently change the dataframe, it will only add spaces to the names for the purposes of displaying it that one time.

Results (in a Jupyter Notebook):

With colfix:

using colfix

Without:

without colfix

Roko Mijic
  • 6,655
  • 4
  • 29
  • 36