Apache pyspark pandas

Question

I am new in apache spark. I create the schema and data frame and it show me result but the format was not good and it so messy. Hardly I can read the line. So i want to show my result in pandas format. I attached the screen shot of my data frame result. But i don't know how to show my result in pandas format.

Here's my code

from pyspark.sql.types import StructType, StructField, IntegerType
from pyspark.sql.types import * 
from IPython.display import display 
import pandas as pd 
import gzip

schema = StructType([StructField("crimeid", StringType(), True), 
                     StructField("Month", StringType(), True), 
                     StructField("Reported_by", StringType(), True),
                     StructField("Falls_within", StringType(), True), 
                     StructField("Longitude", FloatType(), True), 
                     StructField("Latitue", FloatType(), True), 
                     StructField("Location", StringType(), True),
                     StructField("LSOA_code", StringType(), True),
                     StructField("LSOA_name", StringType(), True),
                     StructField("Crime_type", StringType(), True),
                     StructField("Outcome_type", StringType(), True),
                    ])

df = spark.read.csv("crimes.gz",header=False,schema=schema)
df.printSchema()

PATH = "crimes.gz"
csvfile = spark.read.format("csv")\
.option("header", "false")\
.schema(schema)\
.load(PATH)
df1 =csvfile.show()

it shows the result like below

enter image description here

but in want this data pandas form

Thanks

Does this answer your question? [Convert a spark DataFrame to pandas DF](https://stackoverflow.com/questions/50958721/convert-a-spark-dataframe-to-pandas-df) — SMaZ, Dec 07 '20 at 22:50
you can also just paste it in any editor or excel and it won't wrap. — jayrythium, Dec 08 '20 at 13:40
May be you can use df1 =csvfile.show(truncate=False), this will show your full output and you can read it in a better way — Sachin Tiwari, Jun 29 '22 at 06:54

score 0 · Answer 1 · edited Jul 11 '22 at 11:47

0

You can try showing them vertically per row, or truncate big names if you like:

df.show(2, vertical=True)
df.show(2, truncate=4, vertical=True)

edited Jul 11 '22 at 11:47

Adrian Mole

49,934
160
51
83

answered Jul 05 '22 at 18:29

IknewIt

3
1

hackerofjogos · Answer 2 · 2022-07-05T17:45:46.297

Please try:

from pyspark.sql.types import StructType, StructField, IntegerType
from pyspark.sql.types import * 
from IPython.display import display 
import pandas as pd 
import gzip

schema = StructType([StructField("crimeid", StringType(), True), 
                     StructField("Month", StringType(), True), 
                     StructField("Reported_by", StringType(), True),
                     StructField("Falls_within", StringType(), True), 
                     StructField("Longitude", FloatType(), True), 
                     StructField("Latitue", FloatType(), True), 
                     StructField("Location", StringType(), True),
                     StructField("LSOA_code", StringType(), True),
                     StructField("LSOA_name", StringType(), True),
                     StructField("Crime_type", StringType(), True),
                     StructField("Outcome_type", StringType(), True),
                    ])

df = spark.read.csv("crimes.gz",header=False,schema=schema)
df.printSchema()

pandasDF = df.toPandas() # transform PySpark dataframe in Pandas dataframe
print(pandasDF.head()) # print 5 first rows

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). — Jeremy Caney, Jul 03 '22 at 00:46

Apache pyspark pandas

2 Answers2