2

I am reading the CSV file using pyspark. After reading the CSV into the pyspark dataframe it shows me as I have special characters in my header while displaying the data on jupyter notebook. Can anyone please guide me on how can I display data without seeing these special characters? Moreover the data is not aligned as you can see in the picture, how can I display data in the tabular form not like this (without using pandas)

py_df = spark.read.option('header', 'true').csv("E:\Data files\Amazon e-commerce data.csv")

enter image description here

Talha Tayyab
  • 8,111
  • 25
  • 27
  • 44
Jamal Butt
  • 73
  • 4

2 Answers2

2

Just try truncate = False in your show()

py_df = spark.read.option('header', 'true').csv("E:\Data files\Amazon e-commerce data.csv").show(truncate=False)

It will show only 20 rows, if you want to see more rows put n=1000 for 1000 rows in show()

Talha Tayyab
  • 8,111
  • 25
  • 27
  • 44
1

You have too many columns to fit on one line, so the lines wrap. You can limit the number of columns that you show with a .select() before the show:

py_df.withColumn("Double the Price", py_df["price"] * 2).select(["price", "Double the Price").show(n=2)

Maybe jupyter notebook has better support for rendering pandas dataframes? If that's the case, you could try to convert the top of the spark dataframe to pandas dataframe:

py_df.withColumn("Double the Price", py_df["price"] * 2).limit(100).toPandas()
fskj
  • 874
  • 4
  • 15