15

Is there any nicer way to prefix or rename all or multiple columns at the same time of a given SparkSQL DataFrame than calling multiple times dataFrame.withColumnRenamed()?

An example would be if I want to detect changes (using full outer join). Then I'm left with two DataFrames with the same structure.

JiriS
  • 6,870
  • 5
  • 31
  • 40

4 Answers4

19

I suggest to use the select() method to perform this. In fact withColumnRenamed() method uses select() by itself. Here is example how to rename multiple columns:

import org.apache.spark.sql.functions._

val someDataframe: DataFrame = ...

val initialColumnNames = Seq("a", "b", "c")
val renamedColumns = initialColumnNames.map(name => col(name).as(s"renamed_$name"))
someDataframe.select(renamedColumns : _*)
Zyoma
  • 1,528
  • 10
  • 17
  • It's more about Java API and Spark. As an example `select` expects either one String parameter and then varargs or array of `Column`s which is not consistent and also sometimes a bit annoying to use. I had to create a few helper methods to deal with this problem, but would be better to have those methods directly available in `DataFrame`. – JiriS Nov 24 '15 at 13:56
  • There is another example [here](http://stackoverflow.com/questions/32535273/how-to-match-dataframe-column-names-to-scala-case-class-attributes) – Myles Baker Aug 02 '16 at 22:30
  • Did you try your code with Spark 2.0? I'm dealing with 7000 columns, see https://github.com/ramhiser/datamicroarray/wiki/Golub-(1999) . It takes forever (=never finished before my patience was over). – Boern Feb 07 '17 at 12:14
  • 1
    @JiriS do you have a java version of this? or is it we have to stick with the withColumnRenamed method? – Asiri Liyana Arachchi Apr 09 '19 at 03:59
2

I think this method can help you.

public static Dataset<Row> renameDataFrame(Dataset<Row> dataset) {
    for (String column : dataset.columns()) {
        dataset = dataset.withColumnRenamed(column, SystemUtils.underscoreToCamelCase(column));
    }
    return dataset;
}

    public static String underscoreToCamelCase(String underscoreName) {
        StringBuilder result = new StringBuilder();
        if (underscoreName != null && underscoreName.length() > 0) {
            boolean flag = false;
            for (int i = 0; i < underscoreName.length(); i++) {
                char ch = underscoreName.charAt(i);
                if ("_".charAt(0) == ch) {
                    flag = true;
                } else {
                    if (flag) {
                        result.append(Character.toUpperCase(ch));
                        flag = false;
                    } else {
                        result.append(ch);
                    }
                }
            }
        }
        return result.toString();
    }



Alsace
  • 43
  • 5
0

I heve just found the answer

df1_r = df1.select(*(col(x).alias(x + '_df1') for x in df1.columns))

at stackoverflow here (see the end of the accepted answer)

Community
  • 1
  • 1
lanenok
  • 2,699
  • 17
  • 24
0
or (a <- 0 to newsales.columns.length - 1) 
{ 
 var new_c = newsales.columns(a).replace('(','_').replace(')',' ').trim  
 newsales_var = newsales.withColumnRenamed(newsales.columns(a),new_c) 
}
Nagama Inamdar
  • 2,851
  • 22
  • 39
  • 48
Devndra
  • 41
  • 4
  • 2
    Please edit with more information. Code-only and "try this" answers are discouraged, because they contain no searchable content, and don't explain why someone should "try this". We make an effort here to be a resource for knowledge. – abarisone Jun 22 '16 at 14:23