How to change case of whole column to lowercase?

Question

I want to Change case of whole column to Lowercase in Spark Dataset

        Desired Input
        +------+--------------------+
        |ItemID|       Category name|
        +------+--------------------+
        |   ABC|BRUSH & BROOM HAN...|
        |   XYZ|WHEEL BRUSH PARTS...|
        +------+--------------------+

        Desired Output
        +------+--------------------+
        |ItemID|       Category name|
        +------+--------------------+
        |   ABC|brush & broom han...|
        |   XYZ|wheel brush parts...|
        +------+--------------------+

I tried with collectAsList() and toString(), which is slow and complex procedure for very large dataset.

I also found a method 'lower' but didnt get to know how to get it work in dasaset Please suggest me a simple or effective way to do the above. Thanks in advance

score 42 · Answer 1 · edited May 06 '19 at 19:34

42

I Got it (use Functions#lower, see Javadoc)

import org.apache.spark.sql.functions.lower

        String columnName="Category name";
        src=src.withColumn(columnName, lower(col(columnName)));
        src.show();

This replaced old column with new one retaining the whole Dataset.

        +------+--------------------+
        |ItemID|       Category name|
        +------+--------------------+
        |   ABC|brush & broom han...|
        |   XYZ|wheel brush parts...|
        +------+--------------------+

edited May 06 '19 at 19:34

Joker

2,304
25
36

answered Apr 19 '17 at 16:31

Shreeharsha

914
1
10
21

1

import org.apache.spark.sql.functions._ is better as you'll also need the import to use the col() function – user3389171 Mar 03 '20 at 16:57

score 26 · Accepted Answer · answered Apr 19 '17 at 16:22

Use lower function from org.apache.spark.sql.functions

For instance:

df.select($"q1Content", lower($"q1Content")).show

The output.

+--------------------+--------------------+
|           q1Content|    lower(q1Content)|
+--------------------+--------------------+
|What is the step ...|what is the step ...|
|What is the story...|what is the story...|
|How can I increas...|how can i increas...|
|Why am I mentally...|why am i mentally...|
|Which one dissolv...|which one dissolv...|
|Astrology: I am a...|astrology: i am a...|
| Should I buy tiago?| should i buy tiago?|
|How can I be a go...|how can i be a go...|
|When do you use  ...|when do you use  ...|
|Motorola (company...|motorola (company...|
|Method to find se...|method to find se...|
|How do I read and...|how do i read and...|
|What can make Phy...|what can make phy...|
|What was your fir...|what was your fir...|
|What are the laws...|what are the laws...|
|What would a Trum...|what would a trum...|
|What does manipul...|what does manipul...|
|Why do girls want...|why do girls want...|
|Why are so many Q...|why are so many q...|
|Which is the best...|which is the best...|
+--------------------+--------------------+

score 4 · Answer 3 · answered May 16 '22 at 12:38

4

You can do it like this in Scala:

import org.apache.spark.sql.functions._

val dfAfterLowerCase = dfInitial.withColumn("column_name", lower(col("column_name")))
dfAfterLowerCase.show()

answered May 16 '22 at 12:38

Yauheni Leaniuk

418
1
6
15

score 3 · Answer 4 · answered Jul 11 '20 at 07:14

first you should add the library by

import static org.apache.spark.sql.functions.lower;

then you need to put the lower method at the right spot. here is an example:

.and(lower(df1.col("field_name")).equalTo("offeringname"))

I've read all answers here and then tried it myself, for some reason i was stuck with IntelliJ Idea for couple of minutes until I could make it understand (library wise). If you faced this glitch, just add the library by recommendations of IntelliJ as it will pop-up when something is unknown.

Good luck.

How to change case of whole column to lowercase?

4 Answers4

Linked