0

I have downloaded the data from kaggle. following is the link:- https://www.kaggle.com/datasets/utkarshx27/motor-vehicle-collisions

I am using following command to read the CSV:-

data = spark.read.csv('Data/Motor_Vehicle_Collisions_-_Crashes.csv', inferSchema=True, header=True)

I am getting following schema:-

Pyspark corrupted data

Please help me in resolving the issue above

Following is the expected output which I got from pandas read_csv command: Expected format

1 Answers1

0

It looks like your terminal window is too small and the data isn't corrupted at all. Your terminal have line wrapping and so it looks weird but here's how it looks for me (with data.show()):

result in a python notebook

EDIT: your problem is also described here:

pyspark show dataframe as table with horizontal scroll in ipython notebook

titouanbou
  • 141
  • 5