renaming a Column in Data frame while flattening programatically Using selectExpr

Question

I am using the the Code in Below link to flatten Nested Dataframe Flatten a DataFrame in Scala with different DataTypes inside .... I am getting the error below:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'alternateIdentificationQualifierCode' is ambiguous, could be: alternateIdentificationQualifierCode#2, alternateIdentificationQualifierCode#11.; at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:287) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:171) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$10$$anonfun$applyOrElse$4$$anonfun$26.apply(Analyzer.scala:470) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$10$$anonfun$applyOrElse$4$$anonfun$26.apply(Analyzer.scala:470) at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$10$$anonfun$applyOrElse$4.applyOrElse(Analyzer.scala:470) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$10$$anonfun$applyOrElse$4.applyOrElse(Analyzer.scala:466) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

is there any way to Rename columns on the fly programatically in spark-dataframes in scala Thanks in Advance.. \

Code:

object flatten {

  def main(args: Array[String]) {

    if (args.length < 1) {
      System.err.println("Usage: XMLParser.jar <config.properties>")
      println("Please provide the Configuration File for the XML Parser Job")
      System.exit(1)
    }

    val sc = new SparkContext(new SparkConf().setAppName("Spark XML Process"))
    val sqlContext = new HiveContext(sc)
    val prop = new Properties()
    prop.load(new FileInputStream(args(0)))
    val dfSchema = sqlContext.read.format("com.databricks.spark.xml").option("rowTag",prop.getProperty("xmltag")).load(prop.getProperty("input"))
    val flattened_DataFrame=flattenDf(dfSchema)

   // flattened_DataFrame.printSchema()

  }

is there any examples can you provide which will be helpful and i ll just have to rename columns dynamically which keeps varying each time — Joseph Sahayaraj, Aug 03 '17 at 08:04
no.. if you actually look at the code in that link it iterates for each element in the array like this — Joseph Sahayaraj, Aug 03 '17 at 08:26
please look at the link above which may actually give you more clarity on whats my doubt — Joseph Sahayaraj, Aug 03 '17 at 08:27
the link is about flattening a dataframe :) you want to rename all of the column names? your requirement is not clear actually. — Ramesh Maharjan, Aug 03 '17 at 08:48
I am getting that error when the flatten method given in that link is invoked — Joseph Sahayaraj, Aug 03 '17 at 10:23
because of same column names inside struct and normal fields in XML so i ll have to create an alias kind of thing for each column so the column names remain unique — Joseph Sahayaraj, Aug 03 '17 at 10:24
Your dataframe must have ArrayType as dataType as the link given in question has problem with ArrayType. So to confirm please update your question with your dataframe schema before you flatten it. — Ramesh Maharjan, Aug 03 '17 at 12:17

score 1 · Answer 1 · answered Aug 03 '17 at 07:35

1

Use

val renamed_df = df.toDF(Seq("col1","col2","col3"))

to rename columns

answered Aug 03 '17 at 07:35

user5262448

29
3

I will not be knowing columns in advance, i am receiving a Dataframe Dynamically so in loop i ll have to rename the column on the fly with some naming convention – Joseph Sahayaraj Aug 03 '17 at 08:07
You can get column names as an array using "val colNames = df.columns". May be you can do something with that to find duplicates and rename those. – user5262448 Aug 03 '17 at 08:42
I will be knowing the column name only when flatening a struct – Joseph Sahayaraj Aug 03 '17 at 09:36

renaming a Column in Data frame while flattening programatically Using selectExpr

1 Answers1