2

Below is the dataframe from which I need to derive the output.

|A|B|C|E|F|
+---------+
|1|2|3|5|6| 
+---------+

Dataset<Row> set = spark.sql("select A as one, B as two, C as three ,D as four, E as five, F as six from input");

I need to skip the column 'D' if it is not present and move on to the other column(i.e) E.

If any of the column is not present then should print null and move on to the other column assignment.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
George
  • 95
  • 11
  • What output do you currently get? An error since you're selecting a column that doesn't exist? – OneCricketeer Apr 01 '22 at 12:53
  • @OneCricketeer yes i'm getting an error.. cannot resolve column "D". – George Apr 01 '22 at 12:55
  • 1
    [This Question](https://stackoverflow.com/questions/35904136/how-do-i-detect-if-a-spark-dataframe-has-a-column) and [this question](https://stackoverflow.com/questions/16952442/select-columnvalue-if-the-column-exists-otherwise-null) detail how to work around missing columns, both in Scala Spark code and Spark sql, respectively. – tjheslin1 Apr 03 '22 at 07:51

1 Answers1

1

you can try and check first which columns exist in the input, for example:

input = spark.read.parquet(...)
cols = input.columns // returns an Array[String]

and then format your query accordingly.

Guy
  • 124
  • 6
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 04 '22 at 15:42