I have to get the schema from a csv file (the column name and datatype).I have reached so far -
l = [('Alice', 1)]
Person = Row('name', 'age')
rdd = sc.parallelize(l)
person = rdd.map(lambda r: Person(*r))
df2 = spark.createDataFrame(person)
print(df2.schema)
#StructType(List(StructField(name,StringType,true),StructField(age,LongType,true)))
I want to extract the values name
and age
along with StringType
and LongType
however I don't see any method on struct type.
There's toDDL
method of struct type in scala but the same is not available for python.
This is an extension of the mentioned question where I already got help , however I wanted to create a new thread - Get dataframe schema load to metadata table
Thanks for the reply , I am updating the full code -
import pyspark # only run after findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.sql.catalogImplementation", "in-memory") \
.getOrCreate()
from pyspark.sql import Row
l = [('Alice', 1)]
Person = Row('name', 'age')
rdd = sc.parallelize(l)
person = rdd.map(lambda r: Person(*r))
df2 = spark.createDataFrame(person)
df3=df2.dtypes
df1=spark.createDataFrame(df3, ['colname', 'datatype'])
df1.show()
df1.createOrReplaceTempView("test")
spark.sql('''select * from test ''').show()
Output
+-------+--------+
|colname|datatype|
+-------+--------+
| name| string|
| age| bigint|
+-------+--------+
+-------+--------+
|colname|datatype|
+-------+--------+
| name| string|
| age| bigint|
+-------+--------+