PySpark: Read the csv data in pyspark frame. Why does it show special characters in frame? Any way to show in a tabular form except using pandas

Question

I am reading the CSV file using pyspark. After reading the CSV into the pyspark dataframe it shows me as I have special characters in my header while displaying the data on jupyter notebook. Can anyone please guide me on how can I display data without seeing these special characters? Moreover the data is not aligned as you can see in the picture, how can I display data in the tabular form not like this (without using pandas)

py_df = spark.read.option('header', 'true').csv("E:\Data files\Amazon e-commerce data.csv")

score 2 · Accepted Answer · answered Oct 11 '21 at 15:59

2

Just try truncate = False in your show()

py_df = spark.read.option('header', 'true').csv("E:\Data files\Amazon e-commerce data.csv").show(truncate=False)

It will show only 20 rows, if you want to see more rows put n=1000 for 1000 rows in show()

answered Oct 11 '21 at 15:59

Talha Tayyab

8,111
25
27
44

fskj · Answer 2 · 2021-10-13T06:30:30.323

You have too many columns to fit on one line, so the lines wrap. You can limit the number of columns that you show with a .select() before the show:

py_df.withColumn("Double the Price", py_df["price"] * 2).select(["price", "Double the Price").show(n=2)

Maybe jupyter notebook has better support for rendering pandas dataframes? If that's the case, you could try to convert the top of the spark dataframe to pandas dataframe:

py_df.withColumn("Double the Price", py_df["price"] * 2).limit(100).toPandas()

PySpark: Read the csv data in pyspark frame. Why does it show special characters in frame? Any way to show in a tabular form except using pandas

2 Answers2