0

How can I write SQL Spark Commands to return fields with Case Insensitive results?

Example: Sample_DF below

+--------+
|  name  |
+--------+
|  Johnny|
|  Robert|
|  ROBERT|
|  robert|
+--------+

It seems by Default it seems Spark SQL is case sensitive via the field you query for:

spark.sql("select name from Sample_DF where status like '%Robert%'").show
+------+
|name  |
+------+
|Robert|
+------+

What can I do to configure above query to be case insensitive so that it can return below, assuming there is a large list of various roberts of different lower/uppercase variations?

+--------+
|  name  |
+--------+
|  Robert|
|  ROBERT|
|  robert|
+--------+

As I understand SQL Spark does not support MSSQL Collate

Techno04335
  • 1,365
  • 6
  • 22
  • 43

2 Answers2

1

you can make all characters lowercaser.

spark.sql("select status from Sample_DF where lower(status) like '%' || lower('Robert') || '%'").show

there is also a builtin function How to change case of whole column to lowercase?

nbk
  • 45,398
  • 8
  • 30
  • 47
1

If you want to take a look at all the names in the name column, you could use the lower function, which converts all chars to lowercase.

Sample_DF.select(F.lower('name')).show()