How can I define the data types of all the columns coming from pandas in Spark?

Question

I need to define the types of columns within the spark.createDataFrame(). For example, I need to define the format of columns Age and Weight, and I'm using the following code:

from pyspark.sql.types import IntegerType, FloaType

Age1 = Spark.createDataFrame(df['Age'], IntegerType ())
Weight1 = Spark.createDataFrame(df['Weight'], FloatType ())

How can I do it inside a single spark.createDataFrame instead of creating two?

What's `df` here? Is it a pandas DataFrame? If so you just need to do `spark.createDataFrame(df[["Age", "Weight"]])` because you can pass in a pandas DataFrame. If the types come in wrong, you can cast them after. — pault, Jan 10 '19 at 14:57
This is a possible dupe of [Convert between spark.SQL DataFrame and pandas DataFrame](https://stackoverflow.com/questions/41826553/convert-between-spark-sql-dataframe-and-pandas-dataframe), [Converting Pandas dataframe into Spark dataframe error](https://stackoverflow.com/questions/37513355/converting-pandas-dataframe-into-spark-dataframe-error), and [how to change a Dataframe column from String type to Double type in pyspark](https://stackoverflow.com/questions/32284620/how-to-change-a-dataframe-column-from-string-type-to-double-type-in-pyspark). — pault, Jan 10 '19 at 15:02
Possible duplicate of [Convert between spark.SQL DataFrame and pandas DataFrame](https://stackoverflow.com/questions/41826553/convert-between-spark-sql-dataframe-and-pandas-dataframe) — pault, Jan 10 '19 at 15:06

How can I define the data types of all the columns coming from pandas in Spark?

0 Answers0