0

I have a DataFrame:

+-----+---------------+
|thing|         things|
+-----+---------------+
|  foo|[foo, bar, baz]|
|  bar|     [foo, baz]|
|  baz|          [baz]|
+-----+---------------+

And I want to check if thing is in things, i.e. the expected output should be:

+-----+---------------+---------------+
|thing|         things|thing_in_things|
+-----+---------------+---------------+
|  foo|[foo, bar, baz]|           true|
|  bar|     [foo, baz]|          false|
|  baz|          [baz]|           true|
+-----+---------------+---------------+

How can I do this?

pfnuesel
  • 14,093
  • 14
  • 58
  • 71
  • 2
    use [`expr`](http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.expr) to [pass `thing` as an argument](https://stackoverflow.com/questions/51140470/using-a-column-value-as-a-parameter-to-a-spark-dataframe-function) to [`array_contains`](http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.array_contains). For example: `df.withColumn("thing_in_things", expr("array_contains(things, thing)"))` – pault Oct 30 '19 at 20:57
  • 1
    Possible duplicate of [Using a column value as a parameter to a spark DataFrame function](https://stackoverflow.com/questions/51140470/using-a-column-value-as-a-parameter-to-a-spark-dataframe-function) – pault Oct 30 '19 at 20:58

0 Answers0