pyspark show dataframe as table with horizontal scroll in ipython notebook

Question

a pyspark.sql.DataFrame displays messy with DataFrame.show() - lines wrap instead of a scroll.

but displays with pandas.DataFrame.head

I tried these options

import IPython
IPython.auto_scroll_threshold = 9999

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display

but no luck. Although the scroll works when used within Atom editor with jupyter plugin:

I think this is what I did: limit few rows from spark dataframe, then on this "head" dataframe use `spark_df_head.toPandas()` — muon, Jul 12 '18 at 15:23

score 34 · Accepted Answer · answered Sep 28 '18 at 17:42

34

this is a workaround

spark_df.limit(5).toPandas().head()

although, I do not know the computational burden of this query. I am thinking limit() is not expensive. corrections welcome.

answered Sep 28 '18 at 17:42

muon

12,821
11
69
88

agreed that this is not a great solution - but I have not seen a "native" (non-pandas) alternative – WestCoastProjects Nov 18 '18 at 16:42
3

Note that `limit()` do not keep the order of the dataframe. – Louis Yang Dec 19 '18 at 19:32
@LouisYang If the DataFrame is sorted, yes it does. – rjurney Jan 05 '19 at 02:57
this solution is good only for small tables. other then that is slows the all process – ilan_pinto Mar 31 '19 at 14:58
1

The performance seems not to be an issue,based on this single test: Timing with the `%%timeit` magic: `spark_df.limit(5).toPandas().head()` - *9.6s* `spark_df.show(5)` - *10.1s* Running on JupyterLab 1.0.1, connected to a remote databricks's PySpark with DBconnect. `spark_df` is a medium sized dataframe. – Ran Feldesh Jul 09 '19 at 15:24
What if you don't have Pandas installed and it isn't an option? – Ramin Melikov Oct 21 '20 at 15:59
display(spark_df) will show the records in tabular format. to see all the columns / rows click on any one of the records, scroll bar will appear. – Sourav Saha Jul 31 '21 at 07:10

score 29 · Answer 2 · answered Sep 23 '21 at 09:57

29

Just add (and execute)

from IPython.core.display import HTML
display(HTML("<style>pre { white-space: pre !important; }</style>"))

And you'll get the df.show() with the scrollbar

answered Sep 23 '21 at 09:57

jmPicaza

358
4
8

1

super nice, thanks! – rewreu Oct 25 '21 at 20:25
This works great. Unfortunately it also has the drawback with adding horizontal scrolling to Markdown cells. – Jakob Jun 25 '22 at 10:08
I wonder why this is not a default for spark/jupyter. are there any possible drawbacks? – Reza Keshavarz Oct 23 '22 at 16:59

Vijay Jangir · Answer 3 · 2023-04-13T16:21:25.310

16

If anyone's still facing the issue, this could be resolved by tweaking website settings using developer tools.

When you do

Open developer setting (F12). then inspect element (Windows: ctrl+shift+c, Mac: cmd+option+c). After this click (select) the dataframe output (shown in picture above). and uncheck whitespace attribute (see snapshot below)

You just need to do this setting once. (unless you refresh the page)

This will show you the exact data natively as is. No need to convert to pandas.

edited Apr 13 '23 at 16:21

answered Nov 19 '19 at 18:30

Vijay Jangir

584
3
15

This was perfect for a quick and dirty demo, thank you!! yes it breaks if reload, but perfect for a screencast – K.S. Jun 08 '20 at 04:06

score 10 · Answer 4 · edited Feb 19 '20 at 08:56

10

Just edit the css file and you are good to go.

Open the jupyter notebook ../site-packages/notebook/static/style/style.min.css file.
Search for white-space: pre-wrap;, and remove it.
Save the file and restart jupyter-notebook.

Problem fixed. :)

edited Feb 19 '20 at 08:56

B--rian

5,578
10
38
89

answered Feb 19 '20 at 08:34

Karan Singla

101
1
3

score 1 · Answer 5 · answered May 18 '20 at 10:38

Adding to the answers given above by @karan-singla and @vijay-jangir, a handy one-liner to comment out the white-space: pre-wrap styling can be done like so:

$ awk -i inplace '/pre-wrap/ {$0="/*"$0"*/"}1' $(dirname `python -c "import notebook as nb;print(nb.__file__)"`)/static/style/style.min.css

This translates as; use awk to update inplace lines that contain pre-wrap to be surrounded by */ -- */ i.e. comment out, on the file found in styles.css found in your working Python environment.

This, in theory, can then be used as an alias if one uses multiple environments, say with Anaconda.

REFs:

score 1 · Answer 6 · edited Dec 17 '20 at 12:08

1

try display(dataframe_name) , it renders a scrollable table.

edited Dec 17 '20 at 12:08

dboy

1,004
2
16
24

answered Dec 17 '20 at 10:17

jyotiska

31
6

4

this did not work in Jupyter notebook for me. It works in Databricks notebooks, but the question is for Jupyter notebooks. – muon Aug 03 '21 at 21:16
Your answer has solved a very pertinent problem for me. I had been trying to download sample of data after performing some operations in Databricks and none of the answers on the internet seemed to work for me. Your answer creates a table whose sample of 100 records I can download. Thanks a ton. – AshwiniJ Nov 09 '22 at 11:54

score 0 · Answer 7 · answered Jan 05 '21 at 16:09

To be precise for what someone said before. In the file anaconda3/lib/python3.7/site- packages/notebook/static/style/style.min.css there are 2 white-space: nowrap; you have to comment the one here in this way samp { /*white-space: nowrap;*/ } save it and the restart jupyter

score 0 · Answer 8 · answered Nov 05 '21 at 14:18

This solution does not depend on pandas, it does not change the jupyter settings and it looks good (scrollbar will activate automatically).

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("My App").getOrCreate()
spark.conf.set("spark.sql.repl.eagerEval.enabled", True)

data = [
  [1, 1, 'A'],
  [2, 2, 'A'],
  [3, 3, 'A'],
  [4, 3, 'B'],
  [5, 4, 'B'],
  [6, 5, 'C'],
  [7, 6, 'C']]
df = spark.sparkContext.parallelize(data).toDF(('column_1', 'column_2', 'column_3'))

# This will print a pretty table
df

score 0 · Answer 9 · answered Jun 08 '22 at 21:10

What worked for me since im using an environment i dont have access to css files and wanted to do it in a cell using jupyter magic commands got a neat solution.

Found the solution at https://stackoverflow.com/a/63476260/11795760

Just paste in a cell:

%%html
<style>
div.output_area pre {
    white-space: pre;
}

works also in scala notebooks

score 0 · Answer 10 · answered Nov 14 '22 at 13:21

I would create a small function to convert PySpark Dataframe to Pandas Dataframe and then pick head and call it like this

Function

def display_df(df):
    return df.limit(5).toPandas().head()

Then call

display_df(spark_df)

You do have to have pandas imported

import pandas as pd

score -1 · Answer 11 · answered Jun 20 '17 at 17:21

-1

I created below li'l function and it works fine:

def printDf(sprkDF): 
    newdf = sprkDF.toPandas()
    from IPython.display import display, HTML
    return HTML(newdf.to_html())

you can use it straight on your spark queries if you like, or on any spark data frame:

printDf(spark.sql('''
select * from employee
'''))

answered Jun 20 '17 at 17:21

Mbhatt

23
2

3

but `pyspark.sql.DataFrame().toPandas().head()` works just fine without needing your html conversion (see question) ... and one wouldn't want to convert a big dataframe to pandas ... work around is to convert head to pandas – muon Jun 21 '17 at 16:05

pyspark show dataframe as table with horizontal scroll in ipython notebook

11 Answers11

Linked