1

I have an PySpark dataframe (df) that I'd like to print as a nicely formatted table in my Jupyter notebook.

As per this post, I thought the following code would work:

import pandas as pd
from IPython.display import display, HTML

pandas_df = df.toPandas()

display(HTML(pandas_df.to_html()))

Unfortunately, this does not work. I get the following error:

ERROR - failed to write data to stream: <__main__.UnicodeDecodingStringIO object at 0x7f75c7a8e750>

Does anyone know how to resolve this issue?

Thanks!

LenL
  • 11
  • 2

1 Answers1

0

Try the following:

def printDF(inputDF):
    newDF = inputDF.toPandas()
    from IPython.display import display, HTML
    return HTML(newDF.to_html())

You can also move the import statement so that it is imported globally, instead of importing it each time the function is called. Hope this helps.

alex_crow
  • 27
  • 4