pandas dataframe and multi line values printout as string?

Question

I want to do the same as pandas dataframe and multi line values, except with multiple columns of multi-line text:

import pandas as pd

data = [
       {'id': 1, 'col_one': 'very long text\ntext line 2\ntext line 3', 'col_two': 'very long text\ntext line 4\ntext line 5'},
       {'id': 2, 'col_one': 'short text', 'col_two': 'very long text\ntext line 6\ntext line 7'}
       ]
df = pd.DataFrame(data)
df.set_index('id', inplace=True)
print(df)

This prints as:

                                     col_one                                   col_two
id
1   very long text\ntext line 2\ntext line 3  very long text\ntext line 4\ntext line 5
2                                 short text  very long text\ntext line 6\ntext line 7

... and my desired output is:

id            col_one          col_two
1      very long text   very long text
       text line 2      text line 4
       text line 3      text line 5
2      short text       very long text
                        text line 6
                        text line 7

However, two of the answers there mention .stack(), which will add extra 1s in the id column which I do not want; ... actually, this:

print(df.col_one.str.split("\n", expand=True).stack())

# prints:
id
1   0    very long text
    1       text line 2
    2       text line 3
2   0        short text
dtype: object

... might sort of work (would have to suppress the printout of the new row index somehow) - but its one column only, and I want the entire table.

And, the remaining answer mentions this:

from IPython.display import display, HTML

def pretty_print(df):
    return display(HTML(df.to_html().replace("\\n","<br>")))

... which would seemingly do what I want - but the problem is, that display apparently refers to an interactive environment (such as Jupyter notebook). However, I want to use this in a PyQt5 application; and when I try the above function, I simply get:

<IPython.core.display.HTML object>

... printed in the terminal from where I run the PyQt5 application - and the plainTextEdit which was supposed to contain this text shows nothing.

So, how can I do the same as the above pretty_print function - but get a plain, multiline, formatted string as output, which I can use elsewhere?

score 1 · Answer 1 · answered Jun 26 '20 at 12:05

Well, went the hard way, and coded a function for this - with the caveat that it loses the index, so the column titles/names will not be printed in the row above where the index title/name is - but good enough for me, I guess.

import pandas as pd

data = [
       {'id': 1, 'col_one': 'very long text\ntext line 2\ntext line 3', 'col_two': 'very long text\ntext line 4'},
       {'id': 2, 'col_one': 'short text', 'col_two': 'very long text\ntext line 6\ntext line 7'}
       ]
df = pd.DataFrame(data)
df.set_index('id', inplace=True)

def get_df_multiline_printstring(indf_in):
  broken_dfs = []
  #orig_index_name = indf_in.index.name
  #orig_index_dtype = indf_in.index.dtype
  #print("orig index", orig_index_name, orig_index_dtype)
  indf = indf_in.reset_index() #get back the index column? if so, pd.concat will fail with 'TypeError: object of type 'int' has no len()'; only way is to cast, then
  # iterate all columns
  for icol in range(indf.shape[1]):
    # Select column by index position using iloc[]; note, dtype is 'object' for the string columns here!
    columnSeriesObj = indf.iloc[: , icol]
    #print(icol, columnSeriesObj.name, columnSeriesObj.dtype)
    #columnSeriesObj = columnSeriesObj.astype(object) # cast column does not work
    columnSeriesObj = columnSeriesObj.apply(str) # converting all elements to str does;
    broken_dfs.append( columnSeriesObj.str.split("\n", expand=True).stack() ) # "AttributeError: Can only use .str accessor with string values!" here, if we do not have strings everywhere
  # note: without keys=, column names in the concat become 0, 1
  df_concat = pd.concat( broken_dfs, axis=1, keys=indf.columns )
  # "breaking" the short text will result with NaN's - clear them
  df_concat = df_concat.fillna("")
  # do not print index with index=False
  return df_concat.to_string(index=False)

print( get_df_multiline_printstring(df) )

This prints:

id         col_one         col_two
 1  very long text  very long text
       text line 2     text line 4
       text line 3
 2      short text  very long text
                       text line 6
                       text line 7

pandas dataframe and multi line values printout as string?

1 Answers1