I have 2 dataframes
df1=
+--------------+
|questions |
+--------------+
|[Q1, Q2] |
|[Q4, Q6, Q7] |
|... |
+---+----------+
df2 =
+--------------------+---+---+---+---+
| Q1| Q2| Q3| Q4| Q6| Q7 | ... |Q25|
+--------------------+---+---+---+---+
| 1| 0| 1| 0| 0| 1 | ... | 1|
+--------------------+---+---+---+---+
I'd like to add in the first dataframe a new colum with the value of all columns defined into df1.questions
.
Expected result
df1 =
+--------------++--------------+
|questions |values
+--------------+---------------+
|[Q1, Q2] |[1, 0] |
|[Q4, Q6, Q7] |[0, 0, 1] |
|... | |
+---+----------++--------------+
When I do
cols_to_link = ['Q1', 'Q2']
df2= df2.select([col for col in cols_to_link])\
df2 = df2.withColumn('value', F.concat_ws(", ", *df2.columns))
the additionnal column is what I want, but I can't do it by mixing dataframes
It also works when I'm with df2
df2 = df2.select([col for col in df1.select('questions').collect()[0][0]])\
df2 = df2.withColumn('value', F.concat_ws(", ", *df2.columns))
But not when I want to go from df1
df1= df1\
.withColumn('value', F.concat_ws(", ", *df2.select([col for col in df1.select('questions').collect()])))
Where I'm wrong?