-1

I have a dataframe which looks like below:

enter image description here

I need to iterate through item_1 to item_3 and then get the max value of the row and create a new column called item.

output should look like below:

enter image description here

What function should we use here?

Nick Ryan
  • 19
  • 5
  • 1
    Welcome to SO! Please take a moment to read about how to post spark questions: https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-examples – YOLO Feb 27 '20 at 10:39
  • Does this answer your question? [How to find the max value of multiple columns?](https://stackoverflow.com/questions/57531778/how-to-find-the-max-value-of-multiple-columns) – blackbishop Feb 28 '20 at 16:50

1 Answers1

1

the function you are looking for is called greatest

df.withColumn(
    "item",
    greatest("item_1","item_2","item_3")
)
Steven
  • 14,048
  • 6
  • 38
  • 73
  • is there any way we can do "grep" function. Like get all the columns with names "item_" and then perform greatest function?there are a lot of colunmns with item name on it – Nick Ryan Mar 01 '20 at 13:27
  • @NickRyan are you familiar with Python ? `a = [i for i in df.columns if "item_" in i]` and `greatest(*a)` – Steven Mar 02 '20 at 08:49