-2

I am trying to add an empty column in between two columns in a dataframe select statement.

Using the withColumn function, I'm able to append only as an end column, but I need the empty column in the middle (3rd column & 6th column) as shown below.

val product1 = product.select("_c1","_c2"," ","_c4", "_c5", "_c5", " ", "c6")

I tried using withColumn in the middle of the select statement as shown below which gives the error:

val product1 = product.select("_c1","_c2",product.withColumn("NewCol",lit(None).cast("string")),"_c4", "_c5", "_c5", " ", "c6")

>error: overloaded method value select with alternatives:
  (col: String,cols: String*)org.apache.spark.sql.DataFrame <and>
  (cols: org.apache.spark.sql.Column*)org.apache.spark.sql.DataFrame
 cannot be applied to (String, String, String, String, String, String, String, String, org.apache.spark.sql.DataFrame, String)

Please let me know if any suggestions. Thanks

10465355
  • 4,481
  • 2
  • 20
  • 44
Yash_spark
  • 25
  • 2

1 Answers1

1

For selecting columns in dataframes, it is possible to use either strings (column names) or the columns (of Column type) as input. From the documentation:

def select(col: String, cols: String*): DataFrame  
Selects a set of columns.
def select(cols: Column*): DataFrame  
Selects a set of column based expressions.

However, these can not be mixed. In this case, use the select with Column type. To get the column of a specific name, use the col function or $ (after importing spark implicits).

val spark = SparkSession()....
import spark.implicits._

val product1 = product.select($"_c1", $"_c2", lit(" ").as("newCol1"), $"_c4", $"_c5", $"_c5", lit(" ").as("newCol2"), $"c6")
Shaido
  • 27,497
  • 23
  • 70
  • 73