0

Need some help on checking the data type in spark,

I need to convert this pyspark functionality in spark

if dict(df.dtypes)['test_col'] == 'String':
...     print "it is String type"
Code_rocks
  • 131
  • 12

2 Answers2

0

To check data type of column, Use schema function.

Check below code.

df
.schema
.filter(c => c.name == "test_col") // Check your column
.map(_.dataType.typeName)
.headOption
.getOrElse(None)
val dtype = df
.schema
.filter(a => a.name == "c1")
.map(_.dataType.typeName)
.headOption
.getOrElse(None)

if (dtype == "string") println("it is String type")

Use dtypes function.

val dtype = df
.dtypes
.filter(_._1 == "c1")
.map(_._2)
.headOption
.getOrElse(None)

if (dtype  == "StringType" ) println("it is String type")
Srinivas
  • 8,957
  • 2
  • 12
  • 26
0

I have modified the answer in this post to include the datatype in the schema.

Just pass the schema of your dataframe to the function flatten and it will provide you column name and its datatype.

def flatten(schema, prefix=None):
    fields = []
    for field in schema.fields:
        name = prefix + '.' + field.name if prefix else field.name
        dtype = field.dataType
        if isinstance(dtype, ArrayType):
            dtype = dtype.elementType

        if isinstance(dtype, StructType):
            fields += flatten(dtype, prefix=name)
        else:
            fields.append((name, dtype))

    return fields

mySchema = flatten(df.schema)
print("Schema is here", mySchema)
user238607
  • 1,580
  • 3
  • 13
  • 18