3

I have a spark dataframe columns id and articles and 3 lists a_list , b_list, c_list as below.

a_list=[4, 10], b_list=[11,3], c_list=[3,6]

df = spark.createDataFrame([(1, 4), (2, 3), (5, 6)], ("id", "articles"))

I want to update column Found based on the match between value of dataframe column articles to lists (a_list, b_list,c_list)

Currently , I am able to do only if 2 list are there, with below code .

import pyspark.sql.functions as func                    
df.withColumn('E', func.when(df.articles.isin(a_list), 'Found in a_list').otherwise('Found in b_list'))            

How to extrapolate for more than 2 lists ?

Expected output

+---+--------+---------------+                                                  
| id|articles|          Found|
+---+--------+---------------+
|  1|       4|Found in a_list|
|  2|       3|Found in b_list|
|  5|       6|Found in c_list|
+---+--------+---------------+
Ali AzG
  • 1,861
  • 2
  • 18
  • 28
Umi
  • 137
  • 2
  • 11

0 Answers0