0

I need to create dataframe based on the set of columns names and data types. But data types are given in str, int, float etc.. but I need to convert these to StringType, IntegerType etc.. needed for StructType/StructField.

I can create simple mapping do the job but I like to know if there any automatic conversion of these type?

DataNoob
  • 195
  • 14
  • Can you provide a [reproducible example](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-dataframe-examples)? There might be an easier way, but it's hard to tell without seeing exactly what you're trying to do. – pault Oct 15 '18 at 22:08
  • Below is the example but I have field names and type in python i.e. str and int. schema = StructType([ StructField("city", StringType(), True), StructField("country", StringType(), True), StructField("population", IntegerType(), True)]) – DataNoob Oct 15 '18 at 22:28

2 Answers2

3

I know it's been long, but you can try the following:

from pyspark.sql.types import _parse_datatype_string

then you can use it as follows:

_parse_datatype_string('int') # Will convert it to IntegerType of pyspark

NOTE: The type has to be in String format

Reference: https://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/sql/types.html

RobC
  • 22,977
  • 20
  • 73
  • 80
1

You can do that by using the following function:

>>> from pyspark.sql.types import _infer_type
>>> _infer_type([1.0, 2.0])
ArrayType(DoubleType,true)

If you have the type directly in the input you can also do this:

>>> my_type = type(42)
>>> _infer_type(my_type())
LongType

Finally, If you only have a string describing the python type you can use this:

>>> from pydoc import locate
>>> _infer_type(locate('int'))
LongType

Sources:

programort
  • 140
  • 5