I am creating a pyspark dataframe by selecting a column from another dataframe and zipping it with index after converting to RDD and then back to DF as below:
df_tmp=o[1].select("value").rdd.zipWithIndex().toDF()
o[1] is a dataframe, value in o[1]:
+-----+
|value|
+-----+
| 0|
| 0|
| 0|
+-----+
o[1].printSchema()
root
|-- value: integer (nullable = true)
In this process "value" is getting extra square braces as below:
+---+---+
| _1| _2|
+---+---+
|[0]| 0|
|[0]| 1|
+---+---+
df_tmp.printSchema():
root
|-- _1: struct (nullable = true)
| |-- value: long (nullable = true)
|-- _2: long (nullable = true)
When writing to hive table: saveAsTable(), it's causing problems, as it's writing values as: "{"value":0}. However I just want value as: 0.
How can i get rid of the extra braces from this dataframe, so that I can get normal integer values while writing to hive table.