You can use rlike
as zero suggested and Pattern.quote
to handle the special regex characters. Suppose you have this DF:
val df = Seq(
("hel?.o"),
("bbhel?.o"),
("hel?.oZZ"),
("bye")
).toDF("weird_string")
df.show()
+------------+
|weird_string|
+------------+
| hel?.o|
| bbhel?.o|
| hel?.oZZ|
| bye|
+------------+
Here's how to find all the strings that contain "hel?.o"
.
import java.util.regex.Pattern
df
.withColumn("has_hello", $"weird_string".rlike(Pattern.quote("hel?.o")))
.show()
+------------+---------+
|weird_string|has_hello|
+------------+---------+
| hel?.o| true|
| bbhel?.o| true|
| hel?.oZZ| true|
| bye| false|
+------------+---------+
You could also add the quote characters manually to get the same result:
df
.withColumn("has_hello", $"weird_string".rlike("""\Qhel?.o\E"""))
.show()
If you don't properly escape the regex, you won't get the right result:
df
.withColumn("has_hello", $"weird_string".rlike("hel?.o"))
.show()
+------------+---------+
|weird_string|has_hello|
+------------+---------+
| hel?.o| false|
| bbhel?.o| false|
| hel?.oZZ| false|
| bye| false|
+------------+---------+
See this post for more details.