I have a PySpark dataframe which looks like
C C1 C2 C3
1 2 3 4
I want to add another nested column, which is going to make that column of the data frame a json, or an object, I'm not even sure of the correct wording for what this is. It will take the information from other columns of the same row
C C1 C2 C3 V
1 2 3 4 {"C:1", "C1:2", "C2:3", "C3:4"}
I have tried How to add a nested column to a DataFrame but I don't know what the correct syntax in PySpark is, opposed to that question, which is Scala, and that solution looks that will only work for 1 row, I need to do this for hundreds of millions of rows.
I have tried df2 = df.withColumn("V", struct("V.*", col("C1").as('C1')))
but this gives a mysterious syntax error.
Edit: I would not say that this question is a duplicate of pyspark convert row to json with nulls because the solution which was posted by a user here, which solved my problem, is not posted there.
How can I make that nested column V
from the rest of the columns in the same row?