I have a spark dataframe columns id
and articles
and 3 lists a_list
, b_list
, c_list
as below.
a_list=[4, 10], b_list=[11,3], c_list=[3,6]
df = spark.createDataFrame([(1, 4), (2, 3), (5, 6)], ("id", "articles"))
I want to update column Found
based on the match between value of dataframe column articles
to lists (a_list, b_list,c_list)
Currently , I am able to do only if 2 list are there, with below code .
import pyspark.sql.functions as func
df.withColumn('E', func.when(df.articles.isin(a_list), 'Found in a_list').otherwise('Found in b_list'))
How to extrapolate for more than 2 lists ?
Expected output
+---+--------+---------------+
| id|articles| Found|
+---+--------+---------------+
| 1| 4|Found in a_list|
| 2| 3|Found in b_list|
| 5| 6|Found in c_list|
+---+--------+---------------+