The below code works in Scala-Spark.
scala> val ar = Array("oracle", "java")
ar: Array[String] = Array(oracle, java)
scala> df.withColumn("tags", lit(ar)).show(false)
+------+---+----------+----------+--------------+
|name |age|role |experience|tags |
+------+---+----------+----------+--------------+
|John |25 |Developer |2.56 |[oracle, java]|
|Scott |30 |Tester |5.2 |[oracle, java]|
|Jim |28 |DBA |3.0 |[oracle, java]|
|Mike |35 |Consultant|10.0 |[oracle, java]|
|Daniel|26 |Developer |3.2 |[oracle, java]|
|Paul |29 |Tester |3.6 |[oracle, java]|
|Peter |30 |Developer |6.5 |[oracle, java]|
+------+---+----------+----------+--------------+
How do I get the same behavior in PySpark? I tried the below, but it doesn't work and throws Java error.
from pyspark.sql.types import *
tag = ["oracle", "java"]
df2.withColumn("tags", lit(tag)).show()
: java.lang.RuntimeException: Unsupported literal type class java.util.ArrayList [oracle, java]