convert json string to integer with pyspark

Question

I want to convert a string object from json file into integer using pyspark.

df1.select(df1["`result.price`"]).dtypes

Out[15]: [('result.price', 'string')]
 df1=df1.withColumn(df1.select(df1["`result.price`"]),F.col(df1.select(df1["`result.price`"])).cast(T.IntegerType()))

'DataFrame' object has no attribute '_get_object_id'

score 0 · Answer 1 · answered Feb 23 '22 at 17:17

If you want to modify inline:

Since you are trying to modify the data type of nested struct field, I think you need to apply the new StructType.

Take a look at this https://stackoverflow.com/a/63270808/2956135

If you are okay with extracting to a different column:

df1 = df1.withColumn('price', F.col('result.price').cast(T.IntegerType()))

TL;DR

Why your line gives an error?

There is a few mistakes in this syntax.

df1 = df1.withColumn(df1.select(df1["`result.price`"]),F.col(df1.select(df1["`result.price`"])).cast(T.IntegerType()))

First, 1st argument of withColumn has to be string of a column name that you want to save as.

Second, F.col's argument has to be string of a column name or reference to the column.

So, this syntax should not throw an error, however, the casted value is saved to the new column.

df1 = df1.withColumn('result.price', F.col('result.price').cast(T.IntegerType()))

convert json string to integer with pyspark

1 Answers1

TL;DR