This can be a working solution for you - use higher order function array_contains()
instead of loop through every item, however in order to implement the solution we need to streamline a little bit. such as need to make the string column as as an Array
Create the DataFrame Here
from pyspark.sql import functions as F
from pyspark.sql import types as T
df = spark.createDataFrame([(1,"This is a Horse"),(2,"Monkey Loves trees"),(3,"House has a tree"),(4,"The Ocean is Cold")],[ "col1","col2"])
df.show(truncate=False)
Output
+----+-----------------+
|col1|col2 |
+----+-----------------+
|1 |This is a Horse |
|2 |Monkey Loves trees|
|3 |House has a tree |
|4 |The Ocean is Cold|
+----+-----------------+
Logic Here - convert the string column as ArrayType by using split()
df = df.withColumn("col2", F.split("col2", " "))
df = df.withColumn("array_filter", F.when(F.array_contains("col2", "This"), True).when(F.array_contains("col2", "tree"), True))
df = df.filter(F.col("array_filter") == True)
df.show(truncate=False)
Output
+----+---------------------+------------+
|col1|col2 |array_filter|
+----+---------------------+------------+
|1 |[This, is, a, Horse] |true |
|3 |[House, has, a, tree]|true |
+----+---------------------+------------+