28

I'm viewing a Pandas DataFrame in a Jupyter Notebook, and my DataFrame contains URL request strings that can be hundreds of characters long without any whitespace separating characters.

Pandas seems to only wrap text in a cell when there's whitespace, as shown on the attached picture:

enter image description here

If there isn't whitespace, the string is displayed in a single line, and if there isn't enough space my options are either to see a '...' or I have to set display.max_colwidth to a huge number and now I have a hard-to-read table with a lot of scrolling.

Is there a way to force Pandas to wrap text, say, every 100 characters, regardless of whether there is whitespace?

user1956609
  • 2,132
  • 5
  • 27
  • 43
  • Take a look at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.wrap.html , specifically the parameter `break_long_words`. – Shovalt Dec 20 '16 at 09:20

7 Answers7

29

You can set

import pandas as pd
pd.set_option('display.max_colwidth', 0)

and then each column will be just as big as it needs to be in order to fully display it's content. It will not wrap the text content of the cells though (unless they contain spaces).

paulo.filip3
  • 3,167
  • 1
  • 23
  • 28
7

You can use str.wrap method:

df['user_agent'] = df['user_agent'].str.wrap(100) #to set max line width of 100
O.Suleiman
  • 898
  • 1
  • 6
  • 11
3

Try wrapping the text first, then execute the function below. The top-voted answer does not effectively wrap text.

By using pd.set_option('display.max_colwidth', 0), it ineffectively wraps text like this:

Example 1

But, by using the following code, it will effectively wrap the text to any columns width:

from IPython.display import display, HTML

def wrap_df_text(df):
    return display(HTML(df.to_html().replace("\\n","<br>")))

df['user_agent'] = df['user_agent'].str.wrap(30)
wrap_df_text(df)
display(df)

Example 2 - Better Execution

Julian
  • 31
  • 2
1

You can create a new column with the first 100 characters of the data

data['new_column'] = [i[:100] for i in data['old_column']]
Pato Navarro
  • 262
  • 2
  • 11
1

For DataFrame visualization in Jupyter Notebook I would recommend to use the Styler class. It leverages CSS styling language which allows a lot of flexibility out of the box.

As you need to apply a style to all rows, you may use Styler.set_properties method, which returns the same properties for all cells.

Here is an example with CSS styles I've taken from Mozilla web docs for text wrapping.

import pandas as pd

df = pd.DataFrame(
    [['Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0; '
      'GomezAgent 3.0) like Gecko']], 
    columns=['user_agent']
)

df.style.set_properties(
    **{
        'inline-size': '10px',
        'overflow-wrap': 'break-word',
    }, 
    subset='user_agent'
)

Picture of output in Jupyter Notebook

You can find more examples how to control pandas DataFrame styling here https://pandas.pydata.org/docs/user_guide/style.html.

vilozio
  • 111
  • 1
  • 8
0

If you don't mind solving this before you put the whole thing into a dataframe, you can do it like described here. In your particular case, if you'd like each line to be 10 characters long, you would have:

# Input
line = 'Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0; GomezAgent 3.0) 
like Gecko'
n = 10

# Split
line = [line[i:i+n] for i in range(0, len(line), n)]

# The rest is easy
df = pd.DataFrame(line)
print(df)

enter image description here

Without the white spaces, you'll get:

enter image description here

And by the way, the white space at the beginning of the last row occurs because there are not 10 characters to fill the row like there is in the preceding rows. In jupyter you could remedy this by using df.style.set_properties(**{'text-align': 'left'}):

enter image description here

vestland
  • 55,229
  • 37
  • 187
  • 305
0

If you're only in this for ad-hoc, temporary display purposes in Jupyter, you can simply insert whitespace every 100 characters:

chunk_size = 100

data['new_column'] = [' '.join([val[0+i:chunk_size+i] for i in range(0, len(string), chunk_size)] for val in data['old_column']

Though it looks like the reason this is a problem in the first place is because multiple features are collapsed into a single column. It's hard to say without seeing your larger dataset, but if they all follow they same pattern, I'd strongly suggest splitting this out into multiple features (browser, browser version, OS, OS version, etc), which will make any additional work with this dataset easier.

Derek O
  • 16,770
  • 4
  • 24
  • 43
mr_snuffles
  • 312
  • 2
  • 3
  • 1
    1) The whole `data['new_column'] = ` line produces syntax error 2) 'string' is undefined! Do you check your code before publishing it ??? This answer merits 100 downvotes, but I don't like downvoting. – Apostolos Aug 08 '20 at 17:31
  • + to @Apostolos comment – rjurney Dec 24 '20 at 19:57