In all the posted questions for this action, I couldn't find something that works.
I was trying several versions, in all of them I have this DataFrame
:
dataFrame = spark.read.format("com.mongodb.spark.sql").load()
The printout of dataFrame.printSchema()
is
root
|-- SensorId: string (nullable = true)
|-- _id: struct (nullable = true)
| |-- oid: string (nullable = true)
|-- _type: string (nullable = true)
|-- device: string (nullable = true)
|-- deviceType: string (nullable = true)
|-- event_id: string (nullable = true)
|-- gen_val: string (nullable = true)
|-- lane_id: string (nullable = true)
|-- system_id: string (nullable = true)
|-- time: string (nullable = true)
After the DataFrame is created, I want to cast the column 'gen_val'
(that is stored in the variable results.inputColumns
) from String
type to Double
type. Different versions led to different errors.
Version #1
Code:
dataFrame = dataFrame.withColumn(results.inputColumns, dataFrame[results.inputColumns].cast('double'))
using cast(DoubleType())
instead, will generate the same error
Error:
AttributeError: 'DataFrame' object has no attribute 'cast'
Version #2
Code:
dataFrame = dataFrame.withColumn(results.inputColumns, dataFrame['gen_val'].cast('double'))
even though this option is not really releveant, because the parameter cannot be hard-coded...
Error:
dataFrame = dataFrame.withColumn(results.inputColumns, dataFrame['gen_val'].cast('double'))
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1502, in withColumn
File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 323, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o31.withColumn. Trace:
py4j.Py4JException: Method withColumn([class java.util.ArrayList, class org.apache.spark.sql.Column]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:272)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)