0

I need to define the types of columns within the spark.createDataFrame(). For example, I need to define the format of columns Age and Weight, and I'm using the following code:

from pyspark.sql.types import IntegerType, FloaType

Age1 = Spark.createDataFrame(df['Age'], IntegerType ())
Weight1 = Spark.createDataFrame(df['Weight'], FloatType ())

How can I do it inside a single spark.createDataFrame instead of creating two?

pault
  • 41,343
  • 15
  • 107
  • 149
Hugo Coras
  • 35
  • 6
  • 1
    What's `df` here? Is it a pandas DataFrame? If so you just need to do `spark.createDataFrame(df[["Age", "Weight"]])` because you can pass in a pandas DataFrame. If the types come in wrong, you can cast them after. – pault Jan 10 '19 at 14:57
  • This is a possible dupe of [Convert between spark.SQL DataFrame and pandas DataFrame](https://stackoverflow.com/questions/41826553/convert-between-spark-sql-dataframe-and-pandas-dataframe), [Converting Pandas dataframe into Spark dataframe error](https://stackoverflow.com/questions/37513355/converting-pandas-dataframe-into-spark-dataframe-error), and [how to change a Dataframe column from String type to Double type in pyspark](https://stackoverflow.com/questions/32284620/how-to-change-a-dataframe-column-from-string-type-to-double-type-in-pyspark). – pault Jan 10 '19 at 15:02
  • Yes, df is a Pandas DataFrame – Hugo Coras Jan 10 '19 at 15:03
  • Possible duplicate of [Convert between spark.SQL DataFrame and pandas DataFrame](https://stackoverflow.com/questions/41826553/convert-between-spark-sql-dataframe-and-pandas-dataframe) – pault Jan 10 '19 at 15:06

0 Answers0